Week 11 (1000 Genomes Project) Flashcards

1
Q

what is the 1000 genome project?

A

The 1000 Genomes Project is an international research consortium that was set up in 2007 with the aim of sequencing the genomes of at least 1,000 volunteers from multiple populations worldwide in order to improve our understanding of the genetic contribution to human health and disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what was the first model for the 1000 genomes project? why?

A

humans! human research is more funded so they had the money to do this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what combination of sequencing tools did they use to complete the 1000 genome project?

A
  • low coverage whole genome
  • exome sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the 1000s genome project validated a haplotype map of ____ _____ single nucleotide polymorphisms

A

38 million

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

why do low frequency variants tend to be recent?

A

a frequency is the amount of times something shows up, so something that is new tends to have a lower frequency (like a new variant or mutation in the population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

is it possible for mutations to occur over time? if so, how?

A

yes! possible mutations can occur during cell division

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the equation that you use to determine the frequency of a mutation in a population?

A

1/2N (N=number of individuals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the chance of transmission from parent to offspring?

A

50/50 (to transmit ot to not transmit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

in every generation recombination occurs, this is an example of _______ __________

A

linkage disequilibrium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

while doing the 1000s genome project, they found 3.6 million SNPs per individual. On average, how many variants or how different is the genome?

A

0.1%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is low coverage?

A

<5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is high coverage?

A

> 20%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

why did the 1000 genomes project use 5x coverage?

A

it was really expensive to do more than that! (it cost $5 million)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the typical amount of coverage that we use today?

A

30x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

we transmit ________ NOT _______ to the next generation

A

chromosomes; alleles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what amount of coverage did the 1000s genome project use?

A

low coverage (2-6x)

17
Q

the 1000s genome project used wide sampling and low coverage, why?

A

they wanted to characterize common variation, they were able to sample more individuals but sequence at a lower coverage to achieve this

18
Q

how did the 1000 genomes project contract an integrated map of variation?

A
  1. primary data
  2. canidate variants and quality metrics
  3. variant calls and genotype likelihoods
  4. integrated haplotypes
19
Q

which would produce more accurate variant calls, low coverage WGS or high coverage exome?

A

high coverage exome

20
Q

pro and con of low coverage WGS?

A
  • pro: cost effective, can conduct large scale studies
  • con: less accurate variant calls
21
Q

pro and con to high coverage exome?

A
  • pro: more accurate variant calls
  • con: only sequencing 2% of the genome
22
Q

what are exomes sequencing?

A

they sequence only exons (the protein coding regions) and nothing else in the genome, so only 2% of the genome is sequenced

23
Q

why 0, 1, or 2 copies of a variant for an individual?

A

that is the amount of chromosomes available, so you can either have it on neither, one, or both

24
Q

why is the evidence for a single genotype typically weak in low coverage regions?

A

(low coverage=5x), at each position we sequences only 5 reads so there are only 5 reads available to support reference calls

25
Q

the evidence for a single genotype typically weak in low coverage regions. why is it more difficult for heterozygous traits?

A

a single read is sufficient for there to be error, but it could mean it is heterozygous, so your confidence on the call is low

26
Q

the evidence for a single genotype typically weak in low coverage regions. how can we address this?

A

sequence deeper (increase coverage)

27
Q

what procedure/ what is it called when you try to determine if a variant is true or not?

A

variant quality score calibration

28
Q

the 1000 genomes froject identified 38 million variants. how many variants (SNPs) have been discovered today?

A

1.1 billion

29
Q

remember that other type of variation we said we were NOT going to talk about?

A

structural variation

30
Q

what was another name we gave to “regions of low complexity”?

A

repetitive sequence

31
Q

what technology should we use in regions with low complexity? why?

A

long read sequencers, so we can span across the repeat

32
Q

when we make a call about DNA at a position, what are the options for the condition?

A
  • true positive
  • false positice
  • false negative
33
Q

FDR

A

false discovery rate

34
Q

FDR equation

A

FP / FP+TP

(FDR= false discovery rate, FP=false positive, TP = true positive)

35
Q

de novo

36
Q

accessible genome

A

the fraction of the reference genome in which short-read data can lead to reliable variant discovery

37
Q

the 1000 genomes project had challenges identifying large and complex structural variants and shorter indels in regions of low complexity. so what conservative but high quality subsets did they focus on?

A
  • balletic indels
  • large deletions
38
Q

everyone carries “bad” variants. however, not everyone shows them or they never cause issues. why can this happen?

A

we have two chromosomes, so if the other chromosome is functioning it can mask the bad variant