Week 8 (Variants & SNP Chips) Flashcards

1
Q

_____ _________ based on multiple metrics (need to be determined empirically) for variant filtering

A

hard filtering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the two most important metrics in hard filtering?

A
  • QualByDepth (QD)
  • RMSMapping Quality (MQ)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

VCF

A

variant call format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MQ (RMSMappingQuality) has a value of 40 associated with it. What does that mean?

A

it allows us to evaluate how good we think the gene mapped to the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

where would you find a lower MQ (mapping quality)?

A

in repetitive sequences because there are multiple places it could go

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

sensitivity

A

identifying true positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

specificity

A

identifying true negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

all variant callers produce errors. these errors can be classified as false positives and false negatives. when performing a genomic analysis, or any similar analysis for that matter, on has to balance sensitivity and specificity, what do the terms sensitivity and specificity mean in the context of variant calling?

A

sensitivity: trying to discover all the real variants
specificity: trying to limit the false positives that creep in when filters get too lenient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

if you call a variant where one doesn’t exist this is a false __________

A

positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

if you fail to identify where a variant exists it is a false __________

A

negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what type of error is considered the worst?

A

Type 1 error (we don’t want to say something is true if it is not)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variant Quality Score Recalibration

A

does not actually recalibrate QUAL but creates a new score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the purpose of the variant quality score recalibration?

A

the purpose of this new score is to enable variant filtering in a way that allows analysts to balance sensitivity and specificity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

sensitivity in variant quality score recalibration

A

trying to discover all the real variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

specificity in variant quality score recalibration

A

trying to limit the false positives that creep in when filters get too lenient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

________ _______ ________ ___________ uses machine learning algorithms to learn from each dataset what the annotation profile of good variants vs bad variants

A

variant quality score recalibration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

VQSR

A

variant quality score recalibration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

key to VQSR is that you need a “________ _____” for training the model

A

truth set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is 100% sensitivity?

A

calling every difference a variant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

the recalibrated variant quality score provides a continuous estimate of the probability that each variant is true, allowing one to partition the call sets into quality ____________

A

tranches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

__________ are essentially slices of variants, ranked by VQSLOD

A

tranches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

high tranche

A

if you want more variants and are willing to accept false positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

middle tranche

A

if you want to remove most false positives but are also willing to remove some true variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

low tranche

A

if you only want highly accurate true variants with few false positives and willing to miss perhaps many true positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what are tranches?

A

slices in the variant quality scores, where to set the threshold to identify the amount of true positives and accept a number of false positives

26
Q

slices in the variant quality scores, where to set the threshold to identify the amount of true positives and accept a number of false positives

27
Q

what is a genotyping model and software that google has released?

A

Deep Variant

28
Q

what is the standard genotyping model and software used in humans?

A

Deep Variant

29
Q

WGS

A

whole genome sequence

30
Q

what is a whole genome sequence (WGS)?

A

the sequence library

31
Q

what is a SNP Chip used for?

A

to build a (relatively) low cost assay to genotype a large number of individuals

32
Q

sample size is statistical ________

33
Q

what is deep coverage?

34
Q

approximately how much does it cost to run a whole genome sequencing on a mammalian genome at 30x coverage?

35
Q

what is the difference between WGS and SNP chips related to variants?

A
  • WGS captures “all” variation
  • SNP chips have lower number of variants but also a lower cost per sample
36
Q

_______ is the largest genotype provider in the world

37
Q

what was the purpose of in-silico digest of reference genome with multiple restriction enzymes?

A

every time a sequence was seen it would cut it, from there you could compile the amount of reads you had from each segment and the repetitive elements (that you are not interested in) could be found because they had the most sections cut out

38
Q

what is illuminated infinium chemistry?

A

small beads have unique barcodes, for each SNP 50 mer oligos flank it called probes, attach these SNP specific probes to the beads and then create a chip that has microwells for each of the beads to sit in, then deposit the beads on the chip to produce an array

39
Q

what is the basics of the beads used in illumina indium chemistry?

A

the small beads have oligos hanging off of them that correspond to the section of the sequence that you want

40
Q

for each SNP, synthesize a ____ mer oligo that flanks the SNP (probe)

41
Q

what is a probe?

A

50 mer oligo that flanks the SNP

42
Q

_____ base probe

43
Q

infinitum I = ____ probe(s)

44
Q

infinitum II = ____ probe(s)

45
Q

what color bead is G and C in illumina infinitum chemistry?

46
Q

what color bead is A and T in illumina infinitum chemistry?

47
Q

G/C and A/T = ____________

A

infinium I

48
Q

in illumina infinium chemistry, what happens if your variant is As AND Ts? How do you solve this?

A

it would all show up as red, so you would need to use two probes

49
Q

why do we cluster SNPs?

A

so we can determine genotype

50
Q

is this a well clustered SNP or a poorly clustered SNP?

A

well clustered SNP

51
Q

are these example of a well clustered SNP or a poorly clustered SNP?

A

poorly clustered SNP

52
Q

what are these clustered SNPs an example of?

A

improperly clustered SNPs, the automated system just got it wrong, so you should manually fix it

53
Q

SNP chips are really accurate but things can go wrong. remember this when making decisions based on chip genotypes for any single SNP specifically. Why?

A

it depends on what error rate you are comfortable with, your willingness to be wrong

54
Q

call rate per SNP

A

best indicator of genotype quality

55
Q

call rate per individual

A

best indicator sample of DNA quality

56
Q

2 key metrics for looking for errors:

A
  1. call rate per SNP
  2. call rate per individual
57
Q

why might for an individual have a low call rate?

A

poor DNA (for example taking from a live cow vs a cow that has been dead for 1000 years)

58
Q

what does it mean to impute?

A

taking missing data from data that you have already observed and filling in the gaps

59
Q

what do the signs on the right symbolize?

A

whole genome sequence: it is high density and all the variants have been found

60
Q

what do the signs on the left symbolize?

A

SNP Chip: low density