Week 3.5: Ancestry and genealogy Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

The importance of ancestry and genealogy in human genomics and human health, ancestry is a topic of interest for many people.
To understand patterns of genomic variability in humans, we need to understand history. Most of the variation is due to our history, telling us were our lineage came from in the world. Geography plays a very major role in this. We need to look backwards in time to understand where our genetic variants have come from, variability has been surveyed by large scale re-sequencing projects;

A
  • HapMap project
  • Human Genome Diversity Project
  • 1000 Genomes Project
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Common methods to analyse genomic variation, it is not very easy to look at variation of 3.2 billion, there are various methods to allow us to simplify things.

List three methods

A
  1. Principal components analysis – data summary technique it is normally shown in a scatter chart, it looks at all the different kinds of variability and it puts together similar sorts of variation. It pools together all the sites of variation, instead of looking at the variability in 1000 dimensions it will look at it with fewer dimensions
  2. Coalescent theory – it is specific to genetics, traces alleles back in time until they coalesce into a common ancestor.
  3. STRUCTURE (computer programme) analysis – sorts individuals into populations that show Hardy-Weinberg populations of alleles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Lecture Outline

Genome variation and geography
How geography reflects genealogy

Coalescent theory

Global patterns and historical scenarios

Migration: Out of Africa hypothesis
Gene flow: Hybridisation with Neanderthals?
Bottlenecks: Past population sizes
Mutation: What is the human mutation rate?

A

Migration: Out of Africa hypothesis
Gene flow: Hybridisation with Neanderthals?
Bottlenecks: Past population sizes
Mutation: What is the human mutation rate?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Starting point; genomic variation and geography
‘Genes mirror geography within Europe’ Novembre et al. Nature 456, 98-101 (2008)

A

“Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

‘Genes mirror geography within Europe’ Novembre et al. Nature 456, 98-101 (2008)

When they titled the axis they found that the pattern they had reflected a map of Europe. Many ES (Spain) blue, in the bottom left. IE (Ireland) in red. Etc

A

“Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing; an individual’s DNA can be used to infer their geographic origin with surprising accuracy—often to within a few hundred kilometres.”

An individual’s DNA could be used to identify where they come from – and infer location. This could be used for criminal investigations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How are these geographic patterns a reflection of history?

A

The simplest way of looking at this is in terms of coalescence that traces back genealogy; people are more closely related to the closer, they lived to each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Coalescence

A

Coalesce” – to come together, coalescent theory traces alleles back in time until they coalesce into a common ancestor. It looks at different gene genealogies, identity by descent IBD. Assumes if two alleles are identical they are so because of similar decent. If we go back from the present we can look at the coalescent to the principle co-ordinate analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Different process that can account for geographic structure

List four processes

A
  1. Geographical isolation – where if someone lives in an isolated village, where nobody lives or enters, it will follow its own evolutionary trajectory.
  2. Founder events – when a founder starts a new population somewhere, a couple go there and start a village. The whole village will reflect the parent genealogy.
  3. Migration – is if other people start coming to the island and add their genes to the population this gives new variability that was not there in the founders.
  4. Admixture – when you get genes being brought in and being mixed – by hybridisation
  5. All of these can contribute to the geographical patterns of genomic variation – PCA cannot untangle those different processes but the coalescent can. You can model these different processes

PCA shortcomings; limitations with sample size, with critical co-ordinate analysis, if you have one area with a lot of sampling relative to another area with smaller sampling. The area with more sampling looks different to the other areas, this is just an artefact of the way PCA works, RED sampled a lot more than others, it will look more different than the others if you have lots of variability you get lots of different shape coming up in primer size.

Basically be a bit careful with these PCA’s don’t read too much into them…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When we look at global patterns we tend to find that Africa has given rise to more variation than anywhere else. Humans have migrated out of Africa.

There is still controversy about what happen in the past simply by looking at genomic information.

A

“The defining genetic feature of populations historically residing outside of Africa is the tremendous reduction in genetic diversity compared with populations residing in sub-Saharan Africa.”

“Genetic evidence indicates that the vast majority of our ancestry is likely derived from a recent, common ancestral population that gave rise to modern humans”

“Genetic data indicate that, approximately 45 to 60 kya, a very rapid population expansion occurred outside of Africa, and spread in all directions across the Eurasian continents, eventually populating the entire world.”

Serial founder effect – you get sub-sample, then another and another and you can easily see how that is going to reduce variation.

“Genetic data can directly address the time and rate of population growth in the African ancestral population; however, despite recent interest in this topic, current analyses are extremely limited and produce conflicting results.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Parasites also have more diversity

A

“Both Plasmodium falciparum (a malarial parasite) and Helicobacter pylori (a bacterium that occupies the human digestive tract) follow a strikingly similar pattern to the human DNA.”

“Families of languages that are similar enough that most linguists recognize them as such have a common origin in the range of 10,000 y ago.”

“A recent analysis of phonemic diversity in 504 worldwide languages shows that this diversity exhibits the same serial founder effect discussed earlier for genetic variation, namely a loss of phonemic diversity proportional to distance from Africa”

This parallels the study, but is completely independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Major Study

Human Genome Diversity Project

Li, J.Z. et al., Science, 319, 1100-1104 (2008)

http://www.sciencemag.org/content/319/5866/1100

Jakobsson, M. et al. Nature 451, 998-1003 (2008).

http://www.nature.com/nature/journal/v451/n7181/full/nature06742.html

A

525,910 single-nucleotide polymorphisms (SNPs) and

396 copy-number-variable loci

In a worldwide sample of 29 populations

Analysed them as haplotypes, if you have three SNPs, if they are linked there may be only two haplotypes

ACA or TGG, this can give us much more information about ancestry than considering SNPs individually. Much more variability in haplotypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Venn diagram of the percentages of alleles with particular geographic distributions;

A

Africa > Eurasia > East Asia > Oceania > America

81.17 found in all 5 regions in the world, but if you only look at Africa a lot of the SNPs are found in Africa or one other part of the world

Haplotypes we see a lower % in all five regions, more JUST found in Africa, but a similar pattern can be observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If we make trees of population relationships;

A

Neighbour-joining trees of population relationships. Internal branch lengths are proportional to bootstrap support. Lines of intermediate thickness represent internal branches with more than 50%bootstrap support, and the thickest lines represent more than 95% support

Population structure inferred by bayesian clustering. Each individual is shown as a thin vertical line partitioned into K coloured components representing inferred membership in K genetic clusters. The bottom row provides inferred population structure for each geographic region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Final Paragraph

The availability of worldwide high-density SNP data will be important for improving the prospects for disease-gene mapping in a broad set of populations. By employing methods that make use of high-resolution data sets to impute genotypes in study samples, it will be possible to increase power to detect associations in diverse populations for which such data have not previously been available. The data also provide the basis for refining informative marker sets in contexts such as multi-population SNP tagging, admixture mapping and ancestry inference, and for evaluating SNP tagging of CNVs for disease association tests.

A

Middle East – a lot of overlap

Regional ancestry inferred with the frappe program at K = 7 and plotted with the Distruct program. Each individual is represented by a vertical line partitioned into colored segments whose lengths correspond to his/ her ancestry coefficients in up to seven inferred ancestral groups. Population labels were added only after each individual’s ancestry had been estimated; they were used to order the samples in plotting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Gene flow: Did modern humans hybridise with Neandertals?

When the Neanderthals genome was sequenced they found segments that were very similar to human genome suggesting that there was gene flow before the divergence.

A

“We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other.”

When modern humans came out of Africa they hybridised with them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Controversial

You can explain this due to inheritance of alleles that Neadnerthals shared from modern humans anscetary rather than hybridisation.

“We show that the excess polymorphism shared between Eurasians and Neanderthals is compatible with scenarios in which no hybridization occurred, and is strongly linked to the strength of population structure in ancient populations.”

A

“We show that the excess polymorphism shared between Eurasians and Neanderthals is compatible with scenarios in which no hybridization occurred, and is strongly linked to the strength of population structure in ancient populations.”

17
Q

“We find that observed low levels of Neanderthal ancestry in Eurasians are compatible with a very low rate of interbreeding (

A
18
Q

Something else that can influence present gene diversity are bottlenecks

Bottlenecks: What about past population sizes?

A

Bottlenecks are when a population goes down to a small size, for example when the Black Death hit London the population of London shrank a great deal. Bottlenecks effect the diversity of a population

As the population gets smaller coalescence gets more likely because the maximum number of alleles are smaller.

Because the probability that a coalescence event occurs at a particular time is inversely proportional to the population size at that time, the pattern of observed coalescence and sampling events can be used to estimate the demographic history of the population.

If a population is declining you will find a lot of alleles will coalesce recently, where as if a population is expanding then coalescent events would have happen further back into the past. Won’t be many recently, this can help us to look at human history

19
Q

The coalescent and history

Estimate coalescence times for different parts of the human genome to do that you would use present estimates of:

Ømutation rates

Øpast effective population sizes

Øpast population sub-structure

Øpast recombination rates

A

We can’t estimate all of them at once, if we know some we can try to estimate others. One way of looking at coalescent to estimate previous population size in humans was put forward by; Richard Durbin (2011), there are many variables of this method they used diploid genome sequences genome sequences of a Chinese male, Korea male, three Europeans and two Yoruba (African males) and they reconstructed the TMRCA (the most recent common ancestor) for segments of the genome by comparing the two alleles at each polymorphic site in each individual. They found that the more heterozygous they were the longer the time they inferred TMCRA.

20
Q

Looking at Y chromosomes and try to work out when the most recent Male ancestor of model humans relative Females.

A

The Y-chromosomes Adam and mitochondrial Eve, you get different calculations. Different studies give different estimates, this particular study took place on the whole Y chromosome sequence, found a date where the Y chromosome coalescent was high as well as the mitochondrial coalescent, was fairly similar.

This is based on coalescent and therefore could be affected by bottlenecks.

“Dogma has held that the common ancestor of human patrilineal lineages, popularly referred to as the Y-chromosome “Adam,” lived considerably more recently than the common ancestor of female lineages, the so-called mitochondrial “Eve.” However, we conclude that the mitochondrial coalescence time is not substantially greater than that of the Y chromosome.”

21
Q

Mutation: what is the human mutation rate?

It is actually pretty hard to get a good estimate of the mutation rate, there is two ways to calculate it,

A

1. Whole genome sequencing of present day families, look for recent mutations, by comparing individuals of know mutation rate

ØInterestingly you get two different , you get a SLOW mutation rate using this method looking at human families. This rate seems to fit well with the fossil record but not the fast rate.

Ø0.5 x 10−9 mutations per bp per year

2.Comparison of human and primate genomes, assuming human-chimp split 6 mya, estimating based on the timing of human/chimp split

ØFAST mutation rate when comparing split method

Ø1 x 10−9 mutations per bp per year

“Although a slowed molecular clock may harmonize the story of human evolution, it does strange things when applied further back in time, says David Reich, an evolutionary geneticist at Harvard Medical School in Boston, Massachusetts. “You can’t have it both ways.”

For instance, the slowest proposed mutation rate puts the common ancestor of humans and orang-utans at 40 million years ago, he says: more than 20 million years before dates derived from abundant fossil evidence. This very slow clock has the common ancestor of monkeys and humans co-existing with the last dinosaurs. “It gets very complicated,” deadpans Reich.”

“My strong view right now is that the true value of the human mutation rate is an open question.”

David Reich, Harvard, quoted in Nature 489, 343–344 (20 September 2012)

22
Q

Case study: India* interesting study have a look on QMplus

{}

“We warn that ‘models’ in population genetics should be treated with caution. Although they provide an important framework for testing historical hypotheses, they are oversimplifications. For example, the true ancestral populations of India were probably not homogeneous as we assume in our model, but instead were probably formed by clusters of related groups that mixed at different times. However, modelling them as homogeneous fits the data and seems to capture meaningful features of history.”

Nature 461, 489-494 (24 September 2009)

A

“The distribution of genetic variation across geographical location and ethnic background provides a rich source of information about the historical demographic events and processes experienced by a species. However, while colonization, isolation, migration and admixture all lead to a structuring of genetic variation, in which groups of individuals show greater or lesser relatedness to other groups, making inferences about the nature and timing of such processes is notoriously difficult. There are three key problems. First, there are many different processes that one might want to consider as explanations for patterns of structure in empirical data and efficient inference, even under simple models can be difficult. Second, different processes can lead to similar patterns of structure. For example, equilibrium models of restricted migration can give similar patterns of differentiation to non-equilibrium models of population splitting events (at least in terms of some data summaries such as Wright’s ). Third, any species is likely to have experienced many different demographic events and processes in its history and their superposition leads to complex patterns of genetic variability. Consequently, while there is a long history of estimating parameters of demographic models from patterns of genetic variation, such models are often highly simplistic and restricted to a subset of possible explanations.”

McVean G (2009) A Genealogical Interpretation of Principal Components Analysis. PLoS Genet 5(10): e1000686