Microarray

2020-06-24(Wed)
Bio/microarray.py

1. Design of DNA Miroarrays

The idea of DNA microarray technology is to monitor gene expression processes by measuring levels of RNA species in biological samples. There are two main techinques for doing that.

[complementary-DNA arrays] RNA molecules in the samples are labeled by using appropriate techniques and presented to an array of spots, where complementary-DNA (cDNA) fragments corresponding to known coding DNA sequences are placed.
[oligonucleotide arrays] By a reverse transcription mechanism, copy RNA sequences back to a DNA strand, label the DNA with fluorescent dyes and hybridize it to a complementary-DNA probe fixed on the microarray.

After hybridization of the labeled target molecules, estimation of the level of RNA in the sample by reading the intensity of the signals from the dots of the DNA array.

The basic difference between the two microarray formats is in the lengths of the complementary-DNA probes.

[complementary-DNA arrays] In cDNA arrays, the lengths of the cDNA strands differ between spots, and the different probe lengths change the reaction rates between spots. Therefore, the results must be compared to controls.
[oligonucleotide arrays] In oligonucleotide arrays the probes are of constant length in the range of 20–50 base pairs, so the readouts of fluorescence intensities are comparable between different spots.

2. Kinetics of the Binding Process

The hybridization reaction can be represented by the following scheme

$$R + L \overset{k_f}{\underset{k_r}{\rightleftarrows}}C$$

var	unit	description
$R$	numbers of molecules	The number of oligonucleotide strands available for reaction.
$L$	$[\text{mol}/L]$	The molar concentration of free target RNA samples.
$C$	numbers of molecules.	The number of bound complementary complexes.
$k_f$	$[\text{mol}^{-1}\text{time}^{-1}]$	The forward (binding) reaction rates.
$k_r$	$[\text{time}^{-1}]$	The reverse (unbinding) reaction rates.
$V$	$[L]$	The molar volume of the solution interacting with the probe, so we can compute the number of free target RNA molecules in the solution as $LVN_A$

The rate of forward (binding) of target molecules to immobilized probes is proportional to the product of the concentrations of the target strands and free probes.
The rate of reverse (unbinding) process is a first-order reaction with a rate proportional to $C$.

This results in the following balance of flows:

$$\frac{dC}{dt} = k_fRL - k_rC$$

We assume that at the beginning, i.e., at $t=0$, there are no hybridization complexes, and total number of oligonucleotides are available for hybridization.

$$C(0)=0, L(0)=L_0, R(0)=R_T$$

Since one RNA strand binds to one oligonucleotide, resulting in one binding coplex, the following equalities for the flows hold

$$\frac{dR}{dt} = VN_A\frac{dL}{dt} = -\frac{dC}{dt},$$

which results in $R(t)+C(t)=R_T$ and $VN_AL(t)+C(t)=VN_AL_0$. So $L(t)$ and $R(t)$ can be expressed in terms of $C(t)$, and get the following equation:

$$ \begin{aligned} \frac{dC}{dt} &= k_fRL - k_rC \\ &= k_f\left[R_T-C(t)\right]\left[L_0 - \frac{C(t)}{VN_A}\right] - k_rC(t)\\ \end{aligned} $$

Often it is possible to approximate the above dynamics to only one exponent. Specifically, one of the following two asymptotic situations may hold.

[Often] There is a large excess of particles in the immobile probe over the potential number of binding targets, i.e., $R_T\gg C$, or $(R_T − C)/RT \approx 1$, which results in

$$ \begin{aligned} \frac{dC}{dt} &= k_f\left[R_T-C(t)\right]\left[L_0 - \frac{C(t)}{VN_A}\right] - k_rC(t)\\ &= k_f\left[\underbrace{\frac{R_T-C(t)}{R_T}}_{\approx 1}\cdot R_T\right]\left[L_0 - \frac{C(t)}{VN_A}\right] - k_rC(t)\\ &\approx k_fR_TL_0 - \left[\frac{k_fR_T}{VN_A} + k_r\right]C(t)\\ \therefore C(t) &= \frac{R_TL_0}{K_D+R_T/N_AV}\left[1-\exp\left(-\frac{t}{\tau_R}\right)\right]\\ & K_D=k_r/k_f, \quad \tau_R = \frac{1}{k_f\left(K_D + R_T/N_AV\right)} \end{aligned} $$

There is a large excess of free RNA strands with respect to the number of binding complexes, i.e., $L_0 \gg C/N_AV$, or $(L_0 − C/N_AV )/L_0 \approx 1$, which leads to

$$ \begin{aligned} C(t) &= \frac{R_TL_0}{K_D+L_0}\left[1-\exp\left(-\frac{t}{\tau_T}\right)\right]\\ \tau_L &= \frac{1}{k_f\left(K_D+L_0\right)} \end{aligned} $$

Microarray experiments are very often planned in such a way that there is a large excess of particles in the immobile probe over the potential number of binding targets, i.e., case $1$ holds. From the equation

$$ \begin{aligned} C(t) &= \frac{R_TL_0}{K_D+R_T/N_AV}\left[1-\exp\left(-\frac{t}{\tau_R}\right)\right]\\ &\underset{t\rightarrow\infty}{\Rightarrow}\frac{R_TL_0}{K_D+R_T/N_AV}, \end{aligned} $$

one can see that, after equilibrium has been reached, or at a predefined instant of time, the intensity of the fluorescence signal measured at a microarray spot is proportional to the level of the corresponding RNA species in the analyzed sample, i.e., $C \sim L_0$

var	unit	description
\(R\)	numbers of molecules	The number of oligonucleotide strands available for reaction.
\(L\)	\([\text{mol}/L]\)	The molar concentration of free target RNA samples.
\(C\)	numbers of molecules.	The number of bound complementary complexes.
\(k_f\)	\([\text{mol}^{-1}\text{time}^{-1}]\)	The forward (binding) reaction rates.
\(k_r\)	\([\text{time}^{-1}]\)	The reverse (unbinding) reaction rates.
\(V\)	\([L]\)	The molar volume of the solution interacting with the probe, so we can compute the number of free target RNA molecules in the solution as \(LVN_A\)

Microarray

1. Design of DNA Miroarrays

2. Kinetics of the Binding Process

3. Data Preprocessing and Normalization