Variant calling: quality-based filtering Flashcards
Question 1
What is it referring to variant calling ?
Variant calling refers to the identification of probable variants(deviation from the reference sequence) in an alignment.
Question 2
What is the point (the aim) of calling variant ?
Calling variants aims to identify differences with respect to the genome reference sequence in an aligned and sorted BAM file.
Question 3
Haplotype caller
What is the first step ?
The HaplotypeCaller works by first determining regions of the genome for which there is at least some evidence of variation, termed “active regions”.
Question 4
What is the second step of haplotype caller ?
Identify haplotypes that are consistent with the data and realigns each haplotype with the reference genome using the Smith–Waterman algorithm.
Question 5
What is the third step of haplotype caller ?
The HaplotypeCaller does a pairwise alignment of each read against each potential haplotype using a Hidden-Markov Model algorithm, which yields a matrix of probabilities of the haplotypes given the read data.
Question 6
What is the fourth step of haplotype caller ?
The likelihoods are marginalized in order to estimate the likelihoods of alleles at each site with a potential variant. Finally, the most likely genotype is assigned using Bayesian methods.
Question 7
Hard filtering versus VQSR
Most bioinformatic analysis procedures involve some trade-off between … and …
Most bioinformatic analysis procedures involve some trade-off between sensitivity and noise
Question 8
Hard filtering versus VQSR
What does increasing the sensitivity means ?
Reducing the false negative rate
Question 9
Hard filtering versus VQSR
What does add noise means ?
Increase the false positive rate
Question 10
Hard filtering versus VQSR
It is nearly inevitable that increasing the sensitivity (i.e., reducing the false negative rate) will also add …
It is nearly inevitable that increasing the sensitivity (i.e., reducing the false negative rate) will also add noise
Question 11
Hard filtering versus VQSR
What’s the aim of the hard filtering and variant-quality score recalibration ?
The goal of hard filtering and variant-quality score recalibration (VQSR) is to reduce the number of false-positive calls without greatly reducing the sensitivity.
Question 12
Why can’t we compare the quality values returned by different variant caller ?
VCF files assign a quality score (QUAL), which is a Phred-scaled quality score for the assertion made about the alternate (variant) base or sequence (see Equation 12.1).
Each variant caller determines this value using its own algorithms, and one cannot directly compare the QUAL values returned by different variant callers.
Question 13
Hard filtering vs VQSR
How can you evaluate the quality of the variant calling ?
We can use two ways: hard-filtering or VQSR
Question 14
Define the hard-filtering
A fixed threshold is applied to filter out variants.
It’s the hard filtering.
Question 15
Define the VQSR
VQSR is a more sophisticated machine learning procedure that attempts to learn the most appropriate thresholds from the data, using a set of “gold-standard” trusted calls.