3S
  • Portfolio Top
  • Categories
  • Tags
  • Archives

Nussinov algorithm

Nussinov algorithm¶

  • It finds the secondary structure with the most base-pairs.
  • The assumption is too simplistic.
  • But the Nussinov algorithm contains a basic idea for more realistic RNA secondary structure prediction algorithm.
G G G A A A U C C U G G A A A C C G
( ( ( . . . ) ) ) ( ( ( . . . ) ) )

The basic idea

Extending and combining "optimal" structures of shorter subsequences to obtain an optimal structure.

Algorithm¶

Initialization¶

$$ \begin{array}{l}{\gamma(i, i)=0 \text { for } i=1 \text { to } L} \\ {\gamma(i, i-1)=0 \text { for } i=2 \text { to } L}\end{array} $$

Recursion¶

$$ \gamma(i, j)= \max \begin{cases} {\gamma(i+1, j)} \\ {\gamma(i, j-1)} \\ {\gamma(i+1, j-1)+\delta(i, j)} \\ {\max _{i \leq k<j}[\gamma(i, k)+\gamma(k+1, j)]} \end{cases} $$
  • Let $\delta(i,j)=1$ if $i$-th and $j$-th nucleotides can form a base-pair.
  • $\gamma(i,j)$ is the maximum number of base-pairs for subsequence from $i$ to $j$.

※ If you want know more, please look at here!!

Implementation¶

In [1]:
import numpy as np
In [2]:
class Nussinov():
    def __init__(self, Array):
        """initialization"""
        self.arr = Array
        self.N = len(Array)
        self.DP = np.zeros((self.N,self.N),dtype=int)
    
    def _is_bp(self,array,i,j,DNA,Wobble):
        """check if i to j forms a base-pair."""
        bases = array[i] + array[j]
        # Watson-Crick
        WC_1 = ("A" in bases) and ("T" in bases) if DNA else ("A" in bases) and ("U" in bases)
        WC_2 = ("G" in bases) and ("C" in bases)
        # Wobble
        Wob  = ("G" in bases) and ("U" in bases) if Wobble else 0

        flag = WC_1 or WC_2 or Wob
        return 1 if flag else 0
    
    def fit(self,DNA=True,Wobble=True):
        """calcurate DP matrix.(recursion)"""
        for ini_i in reversed(range(self.N)):
            diff = self.N-ini_i
            for i in reversed(range(ini_i)):
                j = i+diff
                delta = 1 if self._is_bp(self.arr,i,j,DNA=DNA,Wobble=Wobble) else 0
                self.DP[i][j] = max(
                    self.DP[i+1][j],
                    self.DP[i][j-1],
                    self.DP[i+1][j-1] + delta,
                    max([self.DP[i][k] + self.DP[k+1][j] for k in range(i,j)]) # Bifurcation
                )
    def trackback(self,DNA=True,Wobble=True):
        """trackback to find which bases form base-pairs."""
        bp = []
        stack = [(0,self.N-1)]
        while(stack):
            i,j = stack.pop(0)
            delta = 1 if self._is_bp(self.arr,i,j,DNA=DNA,Wobble=Wobble) else 0
            
            if (i>=j): continue
            elif self.DP[i+1][j] == self.DP[i][j]: stack.append((i+1,j))
            elif self.DP[i][j-1] == self.DP[i][j]: stack.append((i,j-1))
            elif self.DP[i+1][j-1]+delta == self.DP[i][j]: bp.append((i,j)); stack.append((i+1,j-1))
            else:
                for k in range(i,j):
                    if self.DP[i][k] + self.DP[k+1][j] == self.DP[i][j]:
                        stack.append((k+1,j))
                        stack.append((i,k))
                        break
        #=== Print ===
        S = list(" " * self.N)
        for i,j in bp:
            S[i] = "("; S[j] = ")"
        print(self.arr)
        print("".join(S))           
In [3]:
Array = "AGGAACCCCCUUGGU"
In [4]:
model = Nussinov(Array)
In [5]:
model.fit(DNA=False)
In [6]:
model.DP
Out[6]:
array([[0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5],
       [0, 0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 3, 4, 4],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 3],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
In [7]:
model.trackback(DNA=False)
AGGAACCCCCUUGGU
(((  ))   (()))
In [ ]:
 

  • « Zuker algorithm
  • ゲノム配列解析論Ⅱ 第3回 »
hidden
Table of Contents
Published
Jun 24, 2019
Last Updated
Jun 24, 2019
Category
ゲノム配列解析論Ⅱ
Tags
  • 3S 95
  • ゲノム配列解析論Ⅱ 8
Contact
Other contents
  • Home
  • Blog
  • Front-End
  • Kerasy
  • Python-Charmers
  • Translation-Gummy
    • 3S - Shuto's Notes
    • MIT
    • Powered by Pelican. Theme: Elegant