3A
  • Portfolio Top
  • Categories
  • Tags
  • Archives

Zuker Algorithm

Free energy minimization of RNA seconary structure.¶

  • In cells, RNAs are likely to form energetically stable secondary structure.
  • The correctness of secondary structure should be evaluated based on the free energy, rather than the number of base-pairs (Nussinov Algorithm)
  • We first describe a way to calculate the free energy of a particular RNA secondary structure.
  • We describe the Zuker algorithm which calculates the RNA secondary structure with the minimum free energy (MFE).

Free energy of RNA secondary structure¶

  • The free energy of a secondary structure is approximated as the sum of the free energy of "loops".
$$E = \sum_iE_i$$
  • The free energy of individual loop is given by experimental data. (ex. $\mathrm{C-G: }-3.4\mathrm{kcal/mol}$, $\mathrm{U-A: }-0.9\mathrm{kcal/mol}$)

Five types of "loops"¶

hairpin loop stacking bulge loop internal loop multi-loop
F1(i,j) F2(i,j,h,l) Fm=a+bk+cu
  • a,b,c: constant
  • k: the number of base-pairs in a multi-loop
  • u: the number of single stranded nucleotides in a multi-loop

Algorithm¶

variable meaning
$W(i,j)$ the minimum free energy of subsequence from $i$ to $j$.
$V(i,j)$ the minimum free energy of subsequence from $i$ to $j$ when $i$ to $j$ forms a base-pair.
$M(i,j)$ the minimum free energy of subsequence closed by two or more base pairs.
$M1(i,j)$ the minimum free energy of subsequence closed by one or more base pairs.

Recursion¶

$$ \begin{aligned} W(i,j) &= \min \begin{cases} W(i+1,j)\\W(i,j-1)\\V(i,j)\\\min_{i\leq k<j}\left\{W(i,k) + W(k+1,j)\right\} \end{cases}\\ V(i,j) &= \min \begin{cases} F_1(i,j)\\\min_{i<h<l<j}F_2(i,j,h,l) + V(h,l)\\M(i+1,j-1)+a+b \end{cases}\\ M(i,j) &= \min_{i\leq<k<j}\left\{M1(i,k) + M1(k+1,j)\right\}\\ M1(i,j) &= \min \begin{cases} M(i,j)\\V(i,j)+b\\M1(i+1,j)+c\\M1(i,j-1)+c \end{cases}\\ \end{aligned} $$

Initialization¶

In [1]:
from kerasy.Bio.structure import Zuker
from kerasy.utils.bio import readMonoSeq
In [2]:
model = Zuker()
In [3]:
# 以下の形のパラメタファイルを受け取ります。
! cat params.json
{
    "inf": 1e3,
    
    "type": "RNA",
    "Watson_Crick": true,
    "Wobble": true,
    
    "hairpin":  [Infinity, Infinity, 4.4, 4.6, 4.7, 4.4, 5.0, 4.5, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 5.9, 6.0, 6.1, 6.1, 6.2, 6.2, 6.3, 6.3, 6.4, 6.4, 6.5, 6.5, 6.5, 6.6, 6.6, 6.7, Infinity],
    "internal": [Infinity, 1.0, 1.0, 1.1, 2.0, 2.0, 2.1, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 2.9, 3.0, 3.1, 3.1, 3.2, 3.3, 3.3, 3.4, 3.4, 3.5, 3.5, 3.5, 3.6, 3.6, 3.7, 3.7, Infinity],
    "buldge":   [1.0, 1.0, 1.0, 1.1, 2.0, 2.0, 2.1, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 2.9, 3.0, 3.1, 3.1, 3.2, 3.3, 3.3, 3.4, 3.4, 3.5, 3.5, 3.5, 3.6, 3.6, 3.7, 3.7, Infinity],
    "a": 6.0,
    "b": -1.0,
    "c": 0.1,
    "stacking_cols": ["CG", "AU", "GU"],
    "stacking_score": [[-3.0, -2.0, -2.0],
                       [-2.0, -0.5, -0.5],
                       [-2.0, -0.5, -0.5]]
}
In [4]:
model.load_params("params.json")
In [5]:
model.params()
| parameter  |value |
---------------------
|a           |   6.0|
|b           |  -1.0|
|c           |   0.1|
|inf         |1000.0|
|type        |   RNA|
|Watson_Crick|     1|
|Wobble      |     1|
In [6]:
model.stacking_cols
Out[6]:
array(['CG', 'AU', 'GU'], dtype='<U2')
In [7]:
model.stacking_score
Out[7]:
array([[-3.e+00, -2.e+00, -2.e+00,  1.e+03],
       [-2.e+00, -5.e-01, -5.e-01,  1.e+03],
       [-2.e+00, -5.e-01, -5.e-01,  1.e+03],
       [ 1.e+03,  1.e+03,  1.e+03,  1.e+03]])
In [8]:
# サンプル配列
sequence = readMonoSeq(path="sample_seq4.fasta")
In [9]:
print(sequence)
['CCACCUCGCCGCGAAGAUUGGAUGUAAUGUUCACUUAAGCCUAGGCGUUCGCAAAGAAAUACCUAUGGUAAUUUGACUAGCGGUAACAAUGAAAGAAAAGGUGUAGGCGAGGUGGG']
In [10]:
# 最大ペアの配列
model.predict(sequence[0])
 Zuker Algorithm 
Score: -37.5

=================================================================
seq: CCACCUCGCCGCGAAGAUUGGAUGUAAUGUUCACUUAAGCCUAGGCGUUCGCAAAGAAAU
   : ((((((((((((((((..((((((...)))))))))..((((....((((((..((((((

seq: ACCUAUGGUAAUUUGACUAGCGGUAACAAUGAAAGAAAAGGUGUAGGCGAGGUGGG
   : (((...))))))))..)).))))..))...........)))))).)))))))))).
=================================================================
In [11]:
model.W
Out[11]:
array([[  0. ,   0. ,   0. , ..., -34.5, -37.5, -37.5],
       [  0. ,   0. ,   0. , ..., -34.5, -34.5, -34.5],
       [  0. ,   0. ,   0. , ..., -32.5, -32.5, -32.5],
       ...,
       [  0. ,   0. ,   0. , ...,   0. ,   0. ,   0. ],
       [  0. ,   0. ,   0. , ...,   0. ,   0. ,   0. ],
       [  0. ,   0. ,   0. , ...,   0. ,   0. ,   0. ]])
In [12]:
model.M
Out[12]:
array([[1000. ,  999. ,  999. , ...,  -35.4,  -38.5,  -38.4],
       [1000. , 1000. ,  999. , ...,  -35.5,  -35.4,  -35.3],
       [1000. , 1000. , 1000. , ...,  -33.4,  -33.3,  -33.2],
       ...,
       [1000. , 1000. , 1000. , ..., 1000. ,  999. ,  999. ],
       [1000. , 1000. , 1000. , ..., 1000. , 1000. ,  999. ],
       [1000. , 1000. , 1000. , ..., 1000. , 1000. , 1000. ]])
In [13]:
model.V
Out[13]:
array([[1000. , 1000. , 1000. , ...,  -31.5,  -37.5,  -34.5],
       [1000. , 1000. , 1000. , ...,  -34.5,  -31.5,  -31.5],
       [1000. , 1000. , 1000. , ..., 1000. , 1000. , 1000. ],
       ...,
       [1000. , 1000. , 1000. , ..., 1000. , 1000. , 1000. ],
       [1000. , 1000. , 1000. , ..., 1000. , 1000. , 1000. ],
       [1000. , 1000. , 1000. , ..., 1000. , 1000. , 1000. ]])

  • « 生物物理学 第3回
  • 分子生命科学Ⅲ 第3回 »
hidden
Table of Contents
Published
Oct 10, 2019
Last Updated
Oct 10, 2019
Category
情報基礎実験(浅井)
Tags
  • 3A 127
  • 情報基礎実験(浅井) 13
Contact
Other contents
  • Home
  • Blog
  • Front-End
  • Kerasy
  • Python-Charmers
  • Translation-Gummy
    • 3A - Shuto's Notes
    • MIT
    • Powered by Pelican. Theme: Elegant