1000 Genome Slidedeck Flashcards

1
Q

Long haplotypes= what type of frequency

A

low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Length of a haplotype that a mutations is present on is proportional to

A

how old the mutation is
Recent=long=low frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

why was wide and shallow coverage done in the 1000 genome project?

A

Wide=more people and more data
having more people means more variation in data and allows for the identification of common variants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why was exon sequencing used in 1000 Genome project

A

sequencing is expensive
exons are the coding region so to find meaningful variants it would make sense to use the coding region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

average distance of nucleotides between variants

A

number of variants over total space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which would produce more accurate variable calls, low coverage WGS, or high coverage exome?

A

Snip chips are more accurate so variant calls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Pros of WGS

A

errors become big with little data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pros of high coverage exomes

A

average of 80x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

cons of high coverage exome

A

more expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pros of SNP

A

high confidence and cheap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why did the 1000 genomes project summarize variant sites with 0,1, and 2

A

Diploid=2 chromosomes
AA=0
AB=1
BB=2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is the evidence for a single genotype typically weak in low coverage regions

A

can be a sequencing error
not enough data to confirm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we address the problem of evidence being weak in low coverage regions

A

sequencing using SNP chips

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

type of variation we didn’t talk about

A

structural

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is another name given to regions of low complexitity

A

repetitive regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What technology is used to help with repetitive regions

A

longer reads that can map out more unique data

17
Q

Accessible genome

A

the fraction of the reference genome in which short-read data can lead to reliable variant discovery

18
Q

Accessible genome percentage went from 85% to

A

94% now

19
Q

why would individual calls be more accurate at common variants than at low frequency variants

A

common variants have more data and are more likely to be true than to be a sequencing error

20
Q

variation among samples in genotype accuracy is primarily driven by sequencing depth- why is this true

A

more data=less sequencing errors
allows you to determine what are the variants

21
Q

Moderate to high frequency variants tend to be

A

old

22
Q

low frequency variants tend to be

A

new

23
Q

New mutation equation

A

1/2N

24
Q

Lower frequency variants are

A

population dependent- show up in one population and have not spread to others

25
Q

Why would we expect many low frequency variants

A

different environments and more people is what gets new variants

26
Q

What would you expect for a population that is contracting

A

less new variants, more variants at a higher frequency

27
Q

Are all variants equally important?

A

NO

28
Q

Wobble

A

third base on codon is changed

29
Q

Synonymous

A

same amino acid is coded for

30
Q

nonsynonymous

A

different amino acid is coded for

31
Q

How do you know if an individual has more or less variants than expected

A

Intron vs. exon placement of variation
wobble
nonsense
synonymous
nonsynonymous

32
Q

How is it that we can have an average 150 broken genes but still be normal

A

It depends on other genes or factors
environment also plays a role

33
Q

Everyone carries

A

bad variants

34
Q

Lots of variants in regulatory regions. Why

A
35
Q

Why would regulatory sequence tolerate deleterious variations?

A
36
Q

What is the primary reason to do imputation

A

fine mapping existing association signals and detecting new associations

can fill in missing stuff and find variants

37
Q

Rare variants need to be evaluated using

A

the correct null distribution