Nussinov algorithm¶
- It finds the secondary structure with the most base-pairs.
- The assumption is too simplistic.
- But the Nussinov algorithm contains a basic idea for more realistic RNA secondary structure prediction algorithm.
G | G | G | A | A | A | U | C | C | U | G | G | A | A | A | C | C | G |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
( | ( | ( | . | . | . | ) | ) | ) | ( | ( | ( | . | . | . | ) | ) | ) |
The basic idea
Extending and combining "optimal" structures of shorter subsequences to obtain an optimal structure.Algorithm¶
Initialization¶
$$ \begin{array}{l}{\gamma(i, i)=0 \text { for } i=1 \text { to } L} \\ {\gamma(i, i-1)=0 \text { for } i=2 \text { to } L}\end{array} $$Recursion¶
$$ \gamma(i, j)= \max \begin{cases} {\gamma(i+1, j)} \\ {\gamma(i, j-1)} \\ {\gamma(i+1, j-1)+\delta(i, j)} \\ {\max _{i \leq k<j}[\gamma(i, k)+\gamma(k+1, j)]} \end{cases} $$- Let $\delta(i,j)=1$ if $i$-th and $j$-th nucleotides can form a base-pair.
- $\gamma(i,j)$ is the maximum number of base-pairs for subsequence from $i$ to $j$.
※ If you want know more, please look at here!!
Implementation¶
In [1]:
import numpy as np
In [2]:
class Nussinov():
def __init__(self, Array):
"""initialization"""
self.arr = Array
self.N = len(Array)
self.DP = np.zeros((self.N,self.N),dtype=int)
def _is_bp(self,array,i,j,DNA,Wobble):
"""check if i to j forms a base-pair."""
bases = array[i] + array[j]
# Watson-Crick
WC_1 = ("A" in bases) and ("T" in bases) if DNA else ("A" in bases) and ("U" in bases)
WC_2 = ("G" in bases) and ("C" in bases)
# Wobble
Wob = ("G" in bases) and ("U" in bases) if Wobble else 0
flag = WC_1 or WC_2 or Wob
return 1 if flag else 0
def fit(self,DNA=True,Wobble=True):
"""calcurate DP matrix.(recursion)"""
for ini_i in reversed(range(self.N)):
diff = self.N-ini_i
for i in reversed(range(ini_i)):
j = i+diff
delta = 1 if self._is_bp(self.arr,i,j,DNA=DNA,Wobble=Wobble) else 0
self.DP[i][j] = max(
self.DP[i+1][j],
self.DP[i][j-1],
self.DP[i+1][j-1] + delta,
max([self.DP[i][k] + self.DP[k+1][j] for k in range(i,j)]) # Bifurcation
)
def trackback(self,DNA=True,Wobble=True):
"""trackback to find which bases form base-pairs."""
bp = []
stack = [(0,self.N-1)]
while(stack):
i,j = stack.pop(0)
delta = 1 if self._is_bp(self.arr,i,j,DNA=DNA,Wobble=Wobble) else 0
if (i>=j): continue
elif self.DP[i+1][j] == self.DP[i][j]: stack.append((i+1,j))
elif self.DP[i][j-1] == self.DP[i][j]: stack.append((i,j-1))
elif self.DP[i+1][j-1]+delta == self.DP[i][j]: bp.append((i,j)); stack.append((i+1,j-1))
else:
for k in range(i,j):
if self.DP[i][k] + self.DP[k+1][j] == self.DP[i][j]:
stack.append((k+1,j))
stack.append((i,k))
break
#=== Print ===
S = list(" " * self.N)
for i,j in bp:
S[i] = "("; S[j] = ")"
print(self.arr)
print("".join(S))
In [3]:
Array = "AGGAACCCCCUUGGU"
In [4]:
model = Nussinov(Array)
In [5]:
model.fit(DNA=False)
In [6]:
model.DP
Out[6]:
In [7]:
model.trackback(DNA=False)
In [ ]: