Genome assembly Flashcards
FASTQ
Four line of sequence, line 1 starts with @, line 3 starts with +, line 4 encodes the quality values.
FASTA file
A FASTA file contains sequence information readable by many programs.
Genome assembly
Programs combine fragmented DNA reads to construct the genome, ideally using long, high-quality reads to manage the complexity.
Contig structure
Contiguous sequence formed by several overlapping reads without any gaps.
A scaffold in genome assembly
Ordered and oriented set of contigs.
N50 statistic
Weighted median statistic where 50% of the total assembly length is contained in contigs or scaffolds of length N or larger.
De novo vs comparative genome assembly
Involves assembling reads to form a new sequence without a reference genome, whereas comparative assembly aligns reads against an existing reference sequence.
De Bruijn graph
Graph representing overlaps between sequences of symbols, used in genome assembly by splitting reads into uniform sized units.
Eulerian walk in a graph
Closed trail in a graph with no repeated edges covering all edges of the graph.