W7L2 Thurs human population genetic P4 Flashcards
genetic backgroud
-human genome consists of 3 billion base
-chromosome taken from different people are very similar. similarity up to 999/1000
-a significant part of variation in physical characteristic and susceptibility to diseases is associated with these genetic differences between individuals
How to look at Genomic data
-it is straight forward to measure SNPs, position in the genome where it is know in advance that chromosome in the population carry one of two possible nucleotides
Genotyping array
-typical modern human data sets contain thousands of individuals each typed at alot of SNPs
-cost 50 to measure up to 1M SNPs in a single human individual on genotyping array
-only measure a small fraction of your genome, but at places where population is known in advance to be variable
-miss rare and complex genetic variation
cause for Pattern of genetic variation
-mutation (drift or positive selection)
-Recombination
-SNP data has a complex correlation structure. SNPs nearby each other on the same chromosome are typically inherited together, so genotypes at these SNPs can be highly correlated (linkage disequilibrium)
Geographic differences
-individual who live near each other tend to be more similar genetically than individual who live in different part of the world for several reasons. one of these are due to shared genetic ancestry
-in genetic, this is called geographic population structure
-there can be very subtle difference in pattern of genetic variation even within country
-these pattern is useful to learn about the history of the population within the country
-also important to study genetic basis of disease
geographic population structure study in UK
-UK population structure was studied in large disease association studies to check that it would not compromise analysis
-with the exception of a few particular, small region of the genome and a slight south north gradient in allels frequency, any population structure in the UK was quite subtle
Previous approaches in detecting structure
-previous method ( structure and PCA) can detect population structure at continental or country scale
- implicitly ignore the correlation between nearby SNPs
-typically thin the data to obtain a set of SNPs which are close to independent of each other
-treating SNPs marginally throw away alot of information
disadvantage of structure
-hard to interpret, especially for admixed sample
-does not cope well with continuous differences in allele frequency
disadvantage of PCA
-hard to interpret, especially without labels
-cant assign individuals to cluster
-influenced by sampling biases and the projection can gave an undetermined effect on the conclusion one may draw
fineSTRUCTRE: the new approach
-models the correlation between nearby SNPs (add power to detect subtle structure)
-no need to thin data
-explicitly assign individual to cluster
-determines the optimal number of cluster in the data in a principled manner
-capture a lot of information about shared ancestry
-record how often any individual in a sample has any other individual as its nearest relative
fineSTRUCTRE: exploiting correlation
-estimate the genetic variants on the two separate chromosomes within each individuals
- for each individual, determine the number of chuck copied from each other individual
-treat these chuck vector as summaries for each individual, and use a clusterin algorthim to partition individual into cluster
subtle population structure
- a powerful means of detecting population structure is to control and document carefully the provenance of the sample involved
The POBI sample
-4371 samples
-fit the criteria of 4 grandparent born wihtin 60 km of each other, from rural area
-2886 individual genotyped at 600K SNPs
-2039 past QC
-cluster sample with finrSTRUCTURE
fineSTRUCTURE and POBI
-application of fine structure break the POBI sample into 53 cluster
-fix the cluster and draw a tree by successively joining pair of cluster
-plot sample on a map and see how genetic clustering reflect geography
Observation from fineSTRUCTURE and POBI
-for most part, the inferred cluster supported by historical , archeological and linguistic evidence
-shed light on archeological debate