Free energy minimization of RNA seconary structure.¶

In cells, RNAs are likely to form energetically stable secondary structure.
The correctness of secondary structure should be evaluated based on the free energy, rather than the number of base-pairs (Nussinov Algorithm)
We first describe a way to calculate the free energy of a particular RNA secondary structure.
We describe the Zuker algorithm which calculates the RNA secondary structure with the minimum free energy (MFE).

Free energy of RNA secondary structure¶

The free energy of a secondary structure is approximated as the sum of the free energy of "loops".

$$E = \sum_iE_i$$

The free energy of individual loop is given by experimental data. (ex. $\mathrm{C-G: }-3.4\mathrm{kcal/mol}$, $\mathrm{U-A: }-0.9\mathrm{kcal/mol}$)

Five types of "loops"¶

hairpin loop	stacking	bulge loop	internal loop	multi-loop

F1(i,j)	F2(i,j,h,l)			Fm=a+bk+cu

a,b,c: constant
k: the number of base-pairs in a multi-loop
u: the number of single stranded nucleotides in a multi-loop

Algorithm¶

variable	meaning
$W(i,j)$	the minimum free energy of subsequence from $i$ to $j$.
$V(i,j)$	the minimum free energy of subsequence from $i$ to $j$ when $i$ to $j$ forms a base-pair.
$M(i,j)$	the minimum free energy of subsequence closed by two or more base pairs.
$M1(i,j)$	the minimum free energy of subsequence closed by one or more base pairs.

Recursion¶

$$ \begin{aligned} W(i,j) &= \min \begin{cases} W(i+1,j)\\W(i,j-1)\\V(i,j)\\\min_{i\leq k<j}\left\{W(i,k) + W(k+1,j)\right\} \end{cases}\\ V(i,j) &= \min \begin{cases} F_1(i,j)\\\min_{i<h<l<j}F_2(i,j,h,l) + V(h,l)\\M(i+1,j-1)+a+b \end{cases}\\ M(i,j) &= \min_{i\leq<k<j}\left\{M1(i,k) + M1(k+1,j)\right\}\\ M1(i,j) &= \min \begin{cases} M(i,j)\\V(i,j)+b\\M1(i+1,j)+c\\M1(i,j-1)+c \end{cases}\\ \end{aligned} $$

Initialization¶

In [1]:

from kerasy.Bio.structure import Zuker
from kerasy.utils.bio import readMonoSeq

In [2]:

model = Zuker()

In [3]:

# 以下の形のパラメタファイルを受け取ります。
! cat params.json

{
    "inf": 1e3,
    
    "type": "RNA",
    "Watson_Crick": true,
    "Wobble": true,
    
    "hairpin":  [Infinity, Infinity, 4.4, 4.6, 4.7, 4.4, 5.0, 4.5, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 5.9, 6.0, 6.1, 6.1, 6.2, 6.2, 6.3, 6.3, 6.4, 6.4, 6.5, 6.5, 6.5, 6.6, 6.6, 6.7, Infinity],
    "internal": [Infinity, 1.0, 1.0, 1.1, 2.0, 2.0, 2.1, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 2.9, 3.0, 3.1, 3.1, 3.2, 3.3, 3.3, 3.4, 3.4, 3.5, 3.5, 3.5, 3.6, 3.6, 3.7, 3.7, Infinity],
    "buldge":   [1.0, 1.0, 1.0, 1.1, 2.0, 2.0, 2.1, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 2.9, 3.0, 3.1, 3.1, 3.2, 3.3, 3.3, 3.4, 3.4, 3.5, 3.5, 3.5, 3.6, 3.6, 3.7, 3.7, Infinity],
    "a": 6.0,
    "b": -1.0,
    "c": 0.1,
    "stacking_cols": ["CG", "AU", "GU"],
    "stacking_score": [[-3.0, -2.0, -2.0],
                       [-2.0, -0.5, -0.5],
                       [-2.0, -0.5, -0.5]]
}

In [4]:

model.load_params("params.json")

In [5]:

model.params()

| parameter  |value |
---------------------
|a           |   6.0|
|b           |  -1.0|
|c           |   0.1|
|inf         |1000.0|
|type        |   RNA|
|Watson_Crick|     1|
|Wobble      |     1|

In [6]:

model.stacking_cols

Out[6]:

array(['CG', 'AU', 'GU'], dtype='<U2')

In [7]:

model.stacking_score

Out[7]:

array([[-3.e+00, -2.e+00, -2.e+00,  1.e+03],
       [-2.e+00, -5.e-01, -5.e-01,  1.e+03],
       [-2.e+00, -5.e-01, -5.e-01,  1.e+03],
       [ 1.e+03,  1.e+03,  1.e+03,  1.e+03]])

In [8]:

# サンプル配列
sequence = readMonoSeq(path="sample_seq4.fasta")

In [9]:

print(sequence)

['CCACCUCGCCGCGAAGAUUGGAUGUAAUGUUCACUUAAGCCUAGGCGUUCGCAAAGAAAUACCUAUGGUAAUUUGACUAGCGGUAACAAUGAAAGAAAAGGUGUAGGCGAGGUGGG']

In [10]:

# 最大ペアの配列
model.predict(sequence[0])

 Zuker Algorithm 
Score: -37.5

=================================================================
seq: CCACCUCGCCGCGAAGAUUGGAUGUAAUGUUCACUUAAGCCUAGGCGUUCGCAAAGAAAU
   : ((((((((((((((((..((((((...)))))))))..((((....((((((..((((((

seq: ACCUAUGGUAAUUUGACUAGCGGUAACAAUGAAAGAAAAGGUGUAGGCGAGGUGGG
   : (((...))))))))..)).))))..))...........)))))).)))))))))).
=================================================================

In [11]:

model.W

Out[11]:

array([[  0. ,   0. ,   0. , ..., -34.5, -37.5, -37.5],
       [  0. ,   0. ,   0. , ..., -34.5, -34.5, -34.5],
       [  0. ,   0. ,   0. , ..., -32.5, -32.5, -32.5],
       ...,
       [  0. ,   0. ,   0. , ...,   0. ,   0. ,   0. ],
       [  0. ,   0. ,   0. , ...,   0. ,   0. ,   0. ],
       [  0. ,   0. ,   0. , ...,   0. ,   0. ,   0. ]])

In [12]:

model.M

Out[12]:

array([[1000. ,  999. ,  999. , ...,  -35.4,  -38.5,  -38.4],
       [1000. , 1000. ,  999. , ...,  -35.5,  -35.4,  -35.3],
       [1000. , 1000. , 1000. , ...,  -33.4,  -33.3,  -33.2],
       ...,
       [1000. , 1000. , 1000. , ..., 1000. ,  999. ,  999. ],
       [1000. , 1000. , 1000. , ..., 1000. , 1000. ,  999. ],
       [1000. , 1000. , 1000. , ..., 1000. , 1000. , 1000. ]])

In [13]:

model.V

Out[13]:

array([[1000. , 1000. , 1000. , ...,  -31.5,  -37.5,  -34.5],
       [1000. , 1000. , 1000. , ...,  -34.5,  -31.5,  -31.5],
       [1000. , 1000. , 1000. , ..., 1000. , 1000. , 1000. ],
       ...,
       [1000. , 1000. , 1000. , ..., 1000. , 1000. , 1000. ],
       [1000. , 1000. , 1000. , ..., 1000. , 1000. , 1000. ],
       [1000. , 1000. , 1000. , ..., 1000. , 1000. , 1000. ]])

Zuker Algorithm

Free energy minimization of RNA seconary structure.¶

Free energy of RNA secondary structure¶

Five types of "loops"¶

Algorithm¶

Recursion¶

Initialization¶

Table of Contents

Published

Last Updated

Category

Tags

Contact

Other contents