Variant calling: quality-based filtering Flashcards

1
Q

Question 1

What is it referring to variant calling ?

A

Variant calling refers to the identification of probable variants(deviation from the reference sequence) in an alignment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Question 2

What is the point (the aim) of calling variant ?

A

Calling variants aims to identify differences with respect to the genome reference sequence in an aligned and sorted BAM file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Question 3

Haplotype caller

What is the first step ?

A

The HaplotypeCaller works by first determining regions of the genome for which there is at least some evidence of variation, termed “active regions”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Question 4

What is the second step of haplotype caller ?

A

Identify haplotypes that are consistent with the data and realigns each haplotype with the reference genome using the Smith–Waterman algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Question 5

What is the third step of haplotype caller ?

A

The HaplotypeCaller does a pairwise alignment of each read against each potential haplotype using a Hidden-Markov Model algorithm, which yields a matrix of probabilities of the haplotypes given the read data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Question 6

What is the fourth step of haplotype caller ?

A

The likelihoods are marginalized in order to estimate the likelihoods of alleles at each site with a potential variant. Finally, the most likely genotype is assigned using Bayesian methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Question 7

Hard filtering versus VQSR

Most bioinformatic analysis procedures involve some trade-off between … and …

A

Most bioinformatic analysis procedures involve some trade-off between sensitivity and noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Question 8

Hard filtering versus VQSR

What does increasing the sensitivity means ?

A

Reducing the false negative rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Question 9

Hard filtering versus VQSR

What does add noise means ?

A

Increase the false positive rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Question 10

Hard filtering versus VQSR

It is nearly inevitable that increasing the sensitivity (i.e., reducing the false negative rate) will also add …

A

It is nearly inevitable that increasing the sensitivity (i.e., reducing the false negative rate) will also add noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Question 11

Hard filtering versus VQSR

What’s the aim of the hard filtering and variant-quality score recalibration ?

A

The goal of hard filtering and variant-quality score recalibration (VQSR) is to reduce the number of false-positive calls without greatly reducing the sensitivity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Question 12

Why can’t we compare the quality values returned by different variant caller ?

A

VCF files assign a quality score (QUAL), which is a Phred-scaled quality score for the assertion made about the alternate (variant) base or sequence (see Equation 12.1).

Each variant caller determines this value using its own algorithms, and one cannot directly compare the QUAL values returned by different variant callers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Question 13

Hard filtering vs VQSR

How can you evaluate the quality of the variant calling ?

A

We can use two ways: hard-filtering or VQSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Question 14

Define the hard-filtering

A

A fixed threshold is applied to filter out variants.

It’s the hard filtering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Question 15

Define the VQSR

A

VQSR is a more sophisticated machine learning procedure that attempts to learn the most appropriate thresholds from the data, using a set of “gold-standard” trusted calls.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Question 16

BCFTOOLS

What’s the use of BCFTOOLS ?

A

BCFtools is a program for manipulating VCF files as well as binary variant call format (BCF) files.

17
Q

Question 17

BCFTOOLS

How does you extract the chromosome (CHROM), the position (POS), the reference base(s) in the genome (REF), and the alternate (variant) sequence (ALT) ?

A

bcftools query -f ‘%CHROM %POS %REF %ALT\n’ \ raw_variants.vcf | head -5

18
Q

Question 18

BCFTOOLS

How does you extract the FORMAT tags ?

A

FORMAT tags in the sample columns of the VCF file can be extracted using the square brackets [] operator

19
Q

Question 20

How does extract the AD field of FORMAT ?

A

bcftools query -f ‘[%AD]\n’ raw_variants.vcf | head -5

20
Q

Question 21

QualByDepth

What’s the problem when you have a deep coverage for the QUAL score?

A

This probability is calculated by GATK (or by other variant callers). However, each read contributes a little to the QUAL score, and thus variants in regions with deep coverage can have artificially inflated QUAL scores.

21
Q
A