1000 Genome Slidedeck Flashcards
Long haplotypes= what type of frequency
low
Length of a haplotype that a mutations is present on is proportional to
how old the mutation is
Recent=long=low frequency
why was wide and shallow coverage done in the 1000 genome project?
Wide=more people and more data
having more people means more variation in data and allows for the identification of common variants
Why was exon sequencing used in 1000 Genome project
sequencing is expensive
exons are the coding region so to find meaningful variants it would make sense to use the coding region
average distance of nucleotides between variants
number of variants over total space
Which would produce more accurate variable calls, low coverage WGS, or high coverage exome?
Snip chips are more accurate so variant calls
Pros of WGS
errors become big with little data
Pros of high coverage exomes
average of 80x
cons of high coverage exome
more expensive
Pros of SNP
high confidence and cheap
Why did the 1000 genomes project summarize variant sites with 0,1, and 2
Diploid=2 chromosomes
AA=0
AB=1
BB=2
Why is the evidence for a single genotype typically weak in low coverage regions
can be a sequencing error
not enough data to confirm
How can we address the problem of evidence being weak in low coverage regions
sequencing using SNP chips
type of variation we didn’t talk about
structural
what is another name given to regions of low complexitity
repetitive regions