Beta Phase: Square45 is currently in beta testing. Expect some features or content to be incomplete or missing.
45

Genome Assembly

The process of reconstructing a complete genome sequence from short DNA fragments, typically using computational algorithms and overlap detection.
📜

The statement of the theorem

Let R={r1,r2,,rm}R = \{r_1, r_2, \dots, r_m\} be the set of short read fragments, where each rir_i is a string of length kk. Construct the De Bruijn graph G=(V,E)G = (V, E) where the nodes VV represent all unique (k1)(k-1)-mers (k-1 length substrings) found in RR, and an edge (u,v)E(u, v) \in E exists if the string uu overlaps with vv by k2k-2 characters. The genome sequence SS is sought as an Eulerian path or cycle P=(v1,e1,v2,e2,,vL)\mathcal{P} = (v_1, e_1, v_2, e_2, \dots, v_L) in GG such that the total path length LL maximizes the coverage of the input reads RR, subject to the constraint that the path must traverse all edges corresponding to the observed reads, minimizing the number of unassigned reads.
Source: Wikipedia