Alignment post-processing & variant calling Flashcards
Why is it difficult to call SNPs?
What is a SNP and what is noise?
Outline the workflow from mapping and alignment
Sorting, filtering and indexing
Local realignment
Remove duplicates
Recalibrate base score qualities
What is local realignment? How to do it?
Align each read again in a way where they do not span indels or have clusters of mismatched bases (here, they are likely to be misaligned).
Find relevant regions and do multiple sequence realignment here.
Why do we have duplicates in our data?
How do we deal with them?
What are the drawbacks of doing this?
PCR amplification
We keep the read with the best quality
Natural duplicates and sequencing errors are not taken into consideration.
How to recalibrate base qualities?
Find reported score, position in read, dinucleotide context
Count number of times the site was a mismatch
Calculate Phred quality score from number of mismatches to ref. +1 and number of observed bases +2
What is variant calling and genotyping?
VC: Identify polymorphic sites
Genotyping: Finding the genotype of an individual at a specific site.
Name some programs for genotyping and describe method
Samtools, GATK
Bayesian calculations based on conditioned probabilities, Hardy-Weinberg, prob. of wrong base call and more.
What is variant filtering?
To remove FPs, use known polymorphic sites as training set and use the program to filter away sites that look different. If this (soft) is not possible, use quality, depth, strand bias, and more instead.