Beta Phase: Square45 is currently in beta testing. Expect some features or content to be incomplete or missing.
45

Genome Sequencing

The process of determining the complete DNA sequence of an organism's genome, typically using high-throughput sequencing technologies. \frac{Reads}{GenomeSize} represents sequencing accuracy.
📜

The statement of the theorem

Let R={r1,r2,...,rN}R = \{r_1, r_2, ..., r_N\} be the set of short reads, where each rir_i is a sequence of length LiL_i. Let GG be the unknown reference genome sequence. The goal is to reconstruct GG by solving the assembly problem, often modeled via a de Bruijn graph G=(V,E)\mathcal{G} = (V, E), where VV are k-mers and EE represents overlaps. The estimated genome size is G|G|. The sequencing accuracy A\mathcal{A} is defined by the ratio of observed reads to the expected genome size: A=NLG\mathcal{A} = \frac{N}{L_{G}} \nwhere NN is the total number of reads, and LGL_{G} is the length of the reconstructed genome.
Source: Wikipedia