Genome Assembly
The process of reconstructing a complete genome sequence from short DNA fragments, typically using computational algorithms and overlap detection.
📜
The statement of the theorem
Let be the set of short read fragments, where each is a string of length . Construct the De Bruijn graph where the nodes represent all unique -mers (k-1 length substrings) found in , and an edge exists if the string overlaps with by characters. The genome sequence is sought as an Eulerian path or cycle in such that the total path length maximizes the coverage of the input reads , subject to the constraint that the path must traverse all edges corresponding to the observed reads, minimizing the number of unassigned reads.
Source: Wikipedia