Assembly Lecture Flashcards
All assembly approaches rely on the simple assumption that
highly similar DNA fragments originate from the same position within a genome
What has a fundamental impact on the complexity of assembly
read length
longer reads have more unique DNA-easier to assembly
Contig
set of sequence reads that overlap to form a contigous stretch of DNA sequence
Lower numbers are better= bigger contigs
N50
shortest contig length such that 50% of the bases are contained in contigs of length N (higher is better)
L50
smallest number of contigs whose length sum to N50 (lower is better)
De Bruijn graph
assembly method that uses smaller sub-sequences of sequence reads to find overlaps and build a graph
OLC-overlap layout consensus
assembly method
-overlap-find all pairwise overlaps between all reads
-layout-use those overlaps to determine how the reads should be put together
-consensus- produce a consensus based on the layout and overlap of reads
BUSCO
benchmarking universal single-copy orthologs
To better capture the variation missed by using one reference, we can create and utilize a
pan-genome
a collection of al the DNA sequences that occur in a species