Week 3.5: Ancestry and genealogy Flashcards
The importance of ancestry and genealogy in human genomics and human health, ancestry is a topic of interest for many people.
To understand patterns of genomic variability in humans, we need to understand history. Most of the variation is due to our history, telling us were our lineage came from in the world. Geography plays a very major role in this. We need to look backwards in time to understand where our genetic variants have come from, variability has been surveyed by large scale re-sequencing projects;
- HapMap project
- Human Genome Diversity Project
- 1000 Genomes Project
Common methods to analyse genomic variation, it is not very easy to look at variation of 3.2 billion, there are various methods to allow us to simplify things.
List three methods
- Principal components analysis – data summary technique it is normally shown in a scatter chart, it looks at all the different kinds of variability and it puts together similar sorts of variation. It pools together all the sites of variation, instead of looking at the variability in 1000 dimensions it will look at it with fewer dimensions
- Coalescent theory – it is specific to genetics, traces alleles back in time until they coalesce into a common ancestor.
- STRUCTURE (computer programme) analysis – sorts individuals into populations that show Hardy-Weinberg populations of alleles
Lecture Outline
Genome variation and geography
How geography reflects genealogy
Coalescent theory
Global patterns and historical scenarios
Migration: Out of Africa hypothesis
Gene flow: Hybridisation with Neanderthals?
Bottlenecks: Past population sizes
Mutation: What is the human mutation rate?
Migration: Out of Africa hypothesis
Gene flow: Hybridisation with Neanderthals?
Bottlenecks: Past population sizes
Mutation: What is the human mutation rate?
Starting point; genomic variation and geography
‘Genes mirror geography within Europe’ Novembre et al. Nature 456, 98-101 (2008)
“Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome.”
‘Genes mirror geography within Europe’ Novembre et al. Nature 456, 98-101 (2008)
When they titled the axis they found that the pattern they had reflected a map of Europe. Many ES (Spain) blue, in the bottom left. IE (Ireland) in red. Etc
“Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing; an individual’s DNA can be used to infer their geographic origin with surprising accuracy—often to within a few hundred kilometres.”
An individual’s DNA could be used to identify where they come from – and infer location. This could be used for criminal investigations
How are these geographic patterns a reflection of history?
The simplest way of looking at this is in terms of coalescence that traces back genealogy; people are more closely related to the closer, they lived to each other.
Coalescence
“Coalesce” – to come together, coalescent theory traces alleles back in time until they coalesce into a common ancestor. It looks at different gene genealogies, identity by descent IBD. Assumes if two alleles are identical they are so because of similar decent. If we go back from the present we can look at the coalescent to the principle co-ordinate analysis.
Different process that can account for geographic structure
List four processes
- Geographical isolation – where if someone lives in an isolated village, where nobody lives or enters, it will follow its own evolutionary trajectory.
- Founder events – when a founder starts a new population somewhere, a couple go there and start a village. The whole village will reflect the parent genealogy.
- Migration – is if other people start coming to the island and add their genes to the population this gives new variability that was not there in the founders.
- Admixture – when you get genes being brought in and being mixed – by hybridisation
- All of these can contribute to the geographical patterns of genomic variation – PCA cannot untangle those different processes but the coalescent can. You can model these different processes
PCA shortcomings; limitations with sample size, with critical co-ordinate analysis, if you have one area with a lot of sampling relative to another area with smaller sampling. The area with more sampling looks different to the other areas, this is just an artefact of the way PCA works, RED sampled a lot more than others, it will look more different than the others if you have lots of variability you get lots of different shape coming up in primer size.
Basically be a bit careful with these PCA’s don’t read too much into them…
When we look at global patterns we tend to find that Africa has given rise to more variation than anywhere else. Humans have migrated out of Africa.
There is still controversy about what happen in the past simply by looking at genomic information.
“The defining genetic feature of populations historically residing outside of Africa is the tremendous reduction in genetic diversity compared with populations residing in sub-Saharan Africa.”
“Genetic evidence indicates that the vast majority of our ancestry is likely derived from a recent, common ancestral population that gave rise to modern humans”
“Genetic data indicate that, approximately 45 to 60 kya, a very rapid population expansion occurred outside of Africa, and spread in all directions across the Eurasian continents, eventually populating the entire world.”
Serial founder effect – you get sub-sample, then another and another and you can easily see how that is going to reduce variation.
“Genetic data can directly address the time and rate of population growth in the African ancestral population; however, despite recent interest in this topic, current analyses are extremely limited and produce conflicting results.”
Parasites also have more diversity
“Both Plasmodium falciparum (a malarial parasite) and Helicobacter pylori (a bacterium that occupies the human digestive tract) follow a strikingly similar pattern to the human DNA.”
“Families of languages that are similar enough that most linguists recognize them as such have a common origin in the range of 10,000 y ago.”
“A recent analysis of phonemic diversity in 504 worldwide languages shows that this diversity exhibits the same serial founder effect discussed earlier for genetic variation, namely a loss of phonemic diversity proportional to distance from Africa”
This parallels the study, but is completely independent
Major Study
Human Genome Diversity Project
Li, J.Z. et al., Science, 319, 1100-1104 (2008)
http://www.sciencemag.org/content/319/5866/1100
Jakobsson, M. et al. Nature 451, 998-1003 (2008).
http://www.nature.com/nature/journal/v451/n7181/full/nature06742.html
525,910 single-nucleotide polymorphisms (SNPs) and
396 copy-number-variable loci
In a worldwide sample of 29 populations
Analysed them as haplotypes, if you have three SNPs, if they are linked there may be only two haplotypes
ACA or TGG, this can give us much more information about ancestry than considering SNPs individually. Much more variability in haplotypes
Venn diagram of the percentages of alleles with particular geographic distributions;
Africa > Eurasia > East Asia > Oceania > America
81.17 found in all 5 regions in the world, but if you only look at Africa a lot of the SNPs are found in Africa or one other part of the world
Haplotypes we see a lower % in all five regions, more JUST found in Africa, but a similar pattern can be observed
If we make trees of population relationships;
Neighbour-joining trees of population relationships. Internal branch lengths are proportional to bootstrap support. Lines of intermediate thickness represent internal branches with more than 50%bootstrap support, and the thickest lines represent more than 95% support
Population structure inferred by bayesian clustering. Each individual is shown as a thin vertical line partitioned into K coloured components representing inferred membership in K genetic clusters. The bottom row provides inferred population structure for each geographic region.
Final Paragraph
The availability of worldwide high-density SNP data will be important for improving the prospects for disease-gene mapping in a broad set of populations. By employing methods that make use of high-resolution data sets to impute genotypes in study samples, it will be possible to increase power to detect associations in diverse populations for which such data have not previously been available. The data also provide the basis for refining informative marker sets in contexts such as multi-population SNP tagging, admixture mapping and ancestry inference, and for evaluating SNP tagging of CNVs for disease association tests.
Middle East – a lot of overlap
Regional ancestry inferred with the frappe program at K = 7 and plotted with the Distruct program. Each individual is represented by a vertical line partitioned into colored segments whose lengths correspond to his/ her ancestry coefficients in up to seven inferred ancestral groups. Population labels were added only after each individual’s ancestry had been estimated; they were used to order the samples in plotting.
Gene flow: Did modern humans hybridise with Neandertals?
When the Neanderthals genome was sequenced they found segments that were very similar to human genome suggesting that there was gene flow before the divergence.
“We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other.”
When modern humans came out of Africa they hybridised with them