de novo assembly Flashcards
Explain the greedy approach
and drawbacks
Pairwise alignment
Find the ones that have overlap and merge
Repeats only found if small, high computational cost
Explain approach of Overlap Layout Consensus and de Bruijn
Correct sequencing errors
Assemble contigs
Combine contigs to scaffolds
OLC, method, drawbacks
Make graph where nodes are reads
Branch graph if there are overlaps between reads
Not good for short reads and repeats
de Bruijn, method and drawbacks
Map kmers instead of entire reads. Every kmer exists once, so maybe walk through multiple times.
Then simplify after building graph. Refine by remove assembly parts not supported by PE.
RAM is a problem, optimal kmer is not known.
How to improve de novo assemblies?
PE and MP, better coverage, hybrid methods.
N50
Smallest contig in largest half of the assembly (calculate from assembly sum)
kmer size’s influence on de Bruijn graph
Large k gives limited overlap, small gives a complex graph.