Data preprocessing Flashcards
Explaing fastQC
Shows quality, e.g. how much should be trimmed, removed (adaptors/primers) etc.
What does k-mer correction do? What is it used for? What does it require?
A sliding window of k-mers - use it on all reads to count occurrence of each kmer.
Used to correct sequencing errors.
It requires sufficient coverage.
Rule of thumb, what is the required coverage?
15X
What is sequencing depth? How do you calculate it?
How many times does your data cover the genome (average).
Calculated from:
number of reads * read length / genome size
What is breadth of coverage?
How much of reference is covered by data
Why is assembly affected by the presence of adaptors but alignment is not (as much)?
Because alignment uses Smith-Waterman, a local alignment algorithm