1000 Genome Slidedeck Flashcards
Long haplotypes= what type of frequency
low
Length of a haplotype that a mutations is present on is proportional to
how old the mutation is
Recent=long=low frequency
why was wide and shallow coverage done in the 1000 genome project?
Wide=more people and more data
having more people means more variation in data and allows for the identification of common variants
Why was exon sequencing used in 1000 Genome project
sequencing is expensive
exons are the coding region so to find meaningful variants it would make sense to use the coding region
average distance of nucleotides between variants
number of variants over total space
Which would produce more accurate variable calls, low coverage WGS, or high coverage exome?
Snip chips are more accurate so variant calls
Pros of WGS
errors become big with little data
Pros of high coverage exomes
average of 80x
cons of high coverage exome
more expensive
Pros of SNP
high confidence and cheap
Why did the 1000 genomes project summarize variant sites with 0,1, and 2
Diploid=2 chromosomes
AA=0
AB=1
BB=2
Why is the evidence for a single genotype typically weak in low coverage regions
can be a sequencing error
not enough data to confirm
How can we address the problem of evidence being weak in low coverage regions
sequencing using SNP chips
type of variation we didn’t talk about
structural
what is another name given to regions of low complexitity
repetitive regions
What technology is used to help with repetitive regions
longer reads that can map out more unique data
Accessible genome
the fraction of the reference genome in which short-read data can lead to reliable variant discovery
Accessible genome percentage went from 85% to
94% now
why would individual calls be more accurate at common variants than at low frequency variants
common variants have more data and are more likely to be true than to be a sequencing error
variation among samples in genotype accuracy is primarily driven by sequencing depth- why is this true
more data=less sequencing errors
allows you to determine what are the variants
Moderate to high frequency variants tend to be
old
low frequency variants tend to be
new
New mutation equation
1/2N
Lower frequency variants are
population dependent- show up in one population and have not spread to others
Why would we expect many low frequency variants
different environments and more people is what gets new variants
What would you expect for a population that is contracting
less new variants, more variants at a higher frequency
Are all variants equally important?
NO
Wobble
third base on codon is changed
Synonymous
same amino acid is coded for
nonsynonymous
different amino acid is coded for
How do you know if an individual has more or less variants than expected
Intron vs. exon placement of variation
wobble
nonsense
synonymous
nonsynonymous
How is it that we can have an average 150 broken genes but still be normal
It depends on other genes or factors
environment also plays a role
Everyone carries
bad variants
Lots of variants in regulatory regions. Why
Why would regulatory sequence tolerate deleterious variations?
What is the primary reason to do imputation
fine mapping existing association signals and detecting new associations
can fill in missing stuff and find variants
Rare variants need to be evaluated using
the correct null distribution