Data basics Flashcards
Describe the workflow from question to answer
Raw reads –> preprocessing –> assembly (de novo/alignment) –> application specific steps, e.g. variant calling –> compare samples
Describe fasta
Header, sequence
Describe fastq
Header, sequence, + (maybe additional info here), qualities in ASCII
What determines quality score?
Phasing and intensity of signal compared to noise. This info is converted into a score depending on the machine and software
Expain the encoding
ASCII character can be converted to a number that can be converted to a probability that the base is wrong
Note that there are different ASCII bases (33 and 64).
What does Phred score of 20 mean?
0,01 % risk of the base being wrong
Explain mate pair reads
Long insert paired end reads
DNA is fragmented, ends are repaired using labelled dNPTs and circularized. This circle is then fragmented and the part containing the labelled bases is selected for cluster generation and Illumina seq. Note: Gives reverse-forward reads.
Good for scaffolding in de novo assembly.
Advantages of paired ends?
Precise mapping/alignment/SNP calls
Detection of indels
Easier to build scaffolds