Data preprocessing Flashcards

1
Q

Explaing fastQC

A

Shows quality, e.g. how much should be trimmed, removed (adaptors/primers) etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does k-mer correction do? What is it used for? What does it require?

A

A sliding window of k-mers - use it on all reads to count occurrence of each kmer.
Used to correct sequencing errors.
It requires sufficient coverage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Rule of thumb, what is the required coverage?

A

15X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is sequencing depth? How do you calculate it?

A

How many times does your data cover the genome (average).
Calculated from:
number of reads * read length / genome size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is breadth of coverage?

A

How much of reference is covered by data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is assembly affected by the presence of adaptors but alignment is not (as much)?

A

Because alignment uses Smith-Waterman, a local alignment algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly