Beta Phase: Square45 is currently in beta testing. Expect some features or content to be incomplete or missing.
45

Nucleic Acids

Biopolymers, or large biomolecules, essential to all known forms of life.

Sequence of Expressions

Let B={A,T,G,C}B = \{A, T, G, C\} be the set of nucleobases. Define the pairing function P:B×B{0,1}\mathcal{P}: B \times B \to \{0, 1\} such that P(b1,b2)=1\mathcal{P}(b_1, b_2) = 1 if b1b_1 and b2b_2 form a canonical pair, and P(b1,b2)=0\mathcal{P}(b_1, b_2) = 0 otherwise. The canonical pairing rules are defined by the constraints:\nP(A,T)=1,P(T,A)=1,P(G,C)=1,P(C,G)=1\mathcal{P}(A, T) = 1, \quad \mathcal{P}(T, A) = 1, \quad \mathcal{P}(G, C) = 1, \quad \mathcal{P}(C, G) = 1 \nAnd for all other pairs (b1,b2){(A,T),(T,A),(G,C),(C,G)}(b_1, b_2) \notin \{(A, T), (T, A), (G, C), (C, G)\}, P(b1,b2)=0\mathcal{P}(b_1, b_2) = 0. This pairing dictates the formation of a stable base pair P=(b1,b2)P = (b_1, b_2).
Let SiS_i be the sugar unit and PiP_i be the phosphate group at position ii. The backbone is a polymer sequence defined by the repeating linkage SiPiSi+1S_i - P_i - S_{i+1}. The phosphate group PiP_i links the 55'-hydroxyl group of SiS_i to the 33'-hydroxyl group of Si+1S_{i+1}. The chemical structure is formalized by the phosphodiester bond formation:\nSiOPiOSi+1S_i - O - P_i - O - S_{i+1} \nwhere PiP_i is represented by the phosphate group PO2\text{PO}_2^-. The backbone connectivity is defined by the sequence of linkages L={(S1,P1,S2),(S2,P2,S3),,(SN1,PN1,SN)}\mathcal{L} = \{(S_1, P_1, S_2), (S_2, P_2, S_3), \dots, (S_{N-1}, P_{N-1}, S_N)\}. The overall charge density ρcharge\rho_{charge} is determined by the negative charges of the phosphate groups.
Let Σ={A,T,C,G}\Sigma = \{A, T, C, G\} be the finite alphabet of nucleotides. A nucleic acid sequence SS of length LL is defined as a vector S=(s1,s2,,sL)S = (s_1, s_2, \dots, s_L), where siΣs_i \in \Sigma. The sequence can be represented mathematically as a word in the formal language Σ\Sigma^*. The sequence SS determines the chemical structure C(S)\mathcal{C}(S) and the associated thermodynamic properties ΔG(S)\Delta G(S) via the Hamiltonian H:ΣLR\mathcal{H}: \Sigma^L \to \mathbb{R}.
Consider the polymerization reaction forming a polymer PP from NN monomers MM. Let [P][P] and [M][M] be the concentrations of the polymer chain and the monomer, respectively. The rate of polymerization RpolyR_{\text{poly}} is governed by the rate law: \nRpoly=kadd[P][M]/(1+Kinhib[I])R_{\text{poly}} = k_{\text{add}} [P] [M] / (1 + K_{\text{inhib}} [I]) \nwhere kaddk_{\text{add}} is the rate constant for monomer addition, and Kinhib[I]K_{\text{inhib}} [I] accounts for inhibition by side products [I][I]. The overall change in polymer length LL over time tt is given by the differential equation: \nd[P]dt=Rpoly\frac{d[P]}{dt} = R_{\text{poly}}
Let ri\mathbf{r}_i be the spatial coordinates of the ii-th base pair center, and let NN be the total number of base pairs. The double helix structure is defined by the coordinates R(i)\mathbf{R}(i) of the ii-th unit, parameterized by the helical index iZi \in \mathbb{Z}. The coordinates must satisfy the following geometric constraints:\nR(i)=R0+ih(vz)+12π(vxcos(2πiP)+vysin(2πiP))\mathbf{R}(i) = \mathbf{R}_0 + \frac{i}{h} \left( \mathbf{v}_z \right) + \frac{1}{2\pi} \left( \mathbf{v}_x \cos\left(\frac{2\pi i}{P}\right) + \mathbf{v}_y \sin\left(\frac{2\pi i}{P}\right) \right) \nwhere PP is the helical pitch, hh is the rise per base pair, and vx,vy,vz\mathbf{v}_x, \mathbf{v}_y, \mathbf{v}_z are orthogonal unit vectors defining the axis and cross-section of the helix. Furthermore, the distance between paired bases must be constrained by the base pairing rules.
Consider two DNA strands, Strand1\text{Strand}_1 and Strand2\text{Strand}_2, represented by sequences of nucleotides indexed by ii. The directionality is defined by the indices ii. For Strand1\text{Strand}_1, the sequence runs from i=1i=1 to NN (the 535' \to 3' direction). For Strand2\text{Strand}_2, the corresponding sequence must run from j=1j=1 to NN (the 535' \to 3' direction). The antiparallel constraint requires that the ii-th nucleotide of Strand1\text{Strand}_1 pairs with the (Ni+1)(N-i+1)-th nucleotide of Strand2\text{Strand}_2. Mathematically, if B1(i)B_1(i) and B2(j)B_2(j) are the bases, then for a paired segment of length NN: \nB1(i) pairs with B2(Ni+1)B_1(i) \text{ pairs with } B_2(N-i+1) \nThis implies that the pairing index jj must be a linear function of the index ii such that j=Ni+1j = N-i+1.
Let SS be a sequence of length LL, and let P(si)P(s_i) be the probability of finding nucleotide sis_i at position ii. The information content I(S)I(S) stored in the sequence, measured in bits, is defined by the Shannon entropy HH: \nH(S)=sΣP(s)log2P(s)H(S) = -\sum_{s \in \Sigma} P(s) \log_2 P(s) \nFor a sequence generated by a Markov process of order kk, the conditional probability P(sisik,,si1)P(s_i | s_{i-k}, \dots, s_{i-1}) governs the sequence generation, and the total information is related to the joint probability distribution P(S)=i=1LP(sisik,,si1)P(S) = \prod_{i=1}^{L} P(s_i | s_{i-k}, \dots, s_{i-1}).
Define the standard B-DNA helix by its helical parameters (αB,βB,γB)(\alpha_B, \beta_B, \gamma_B), where α\alpha is the rise per base pair, β\beta is the twist angle, and γ\gamma is the roll angle. The Z-DNA conformation is characterized by a transition to a left-handed helix and a zigzag backbone structure. Mathematically, this transition is defined by the change in the helical parameters: \nαZ=αB/2\alpha_Z = \alpha_B / 2 \nβZ=βB\beta_Z = -\beta_B \nγZ=π/2\gamma_Z = \pi/2 \nThis transformation results in a backbone vector r(n)\mathbf{r}(n) that follows a zigzag path, deviating from the sinusoidal path of B-DNA.
Consider a donor-acceptor pair (D,A)(D, A) and a hydrogen bond interaction potential EHBE_{HB}. The potential energy EHBE_{HB} between a donor DD (e.g., N-H) and an acceptor AA (e.g., O or N) is modeled by a sum of distance and angular terms:\nEHB(r,θ)=E0(rDAr0)nek(rDAr0)+Eθ(1(cos(θθ0))1)E_{HB}(r, \theta) = E_0 \left( \frac{r_{DA}}{r_0} \right)^n e^{-k(r_{DA} - r_0)} + E_{\theta} \left( 1 - \frac{(\cos(\theta - \theta_0))}{1} \right) \nwhere rDAr_{DA} is the distance between DD and AA, r0r_0 is the equilibrium distance, θ\theta is the angle formed by the donor-H-acceptor angle, and E0,Eθ,kE_0, E_{\theta}, k are empirical constants. Stable pairing requires EHB<EcritE_{HB} < E_{crit}.
Let S=(b1,b2,θ)S = (b_1, b_2, \boldsymbol{\theta}) be a stack of NN adjacent base pairs, where bib_i is the ii-th base pair and θ\boldsymbol{\theta} represents the relative orientation. The stacking interaction energy EstackE_{\text{stack}} is modeled as a pairwise potential sum over adjacent bases: \nEstack=12θTKθ+1N1τTr(Ri,i+1)E_{\text{stack}} = \frac{1}{2} \boldsymbol{\theta}^T \boldsymbol{K} \boldsymbol{\theta} + \frac{1}{N-1} \boldsymbol{\tau} \text{Tr}\big(\boldsymbol{R}_{i, i+1}\big) \nwhere K\boldsymbol{K} is the stiffness matrix governing torsional strain, and Ri,i+1\boldsymbol{R}_{i, i+1} is the overlap matrix quantifying the π\pi-electron overlap between bib_i and bi+1b_{i+1}, typically approximated by a function of the distance di,i+1d_{i, i+1} and the dihedral angle ϕi,i+1\phi_{i, i+1}.