Genetic Drift Flashcards
What is the definition of genetic drift?
- Genetic drift is a kind of random sampling of alleles entering the next generation
- Drift acts as a dispersive forces that removes variation
- Any population of finite size will be subject to genetic drift
- Can be thought of as ‘accidents of sampling’ - which influence which alleles make it into the next generation
Name 3 sources of randomness that contribute to genetic drift
- Which allele in gametes come together by chance - e.g., for a heterozygous individual there is a 50:50 chance as to which alleles ends up in the fertilised zygote
- Variation in chance of survival
- Variation in chance of reproductive success
What effect does drift have on variants?
- Rare variants are easily lost due to chance events
- Common variants are less sensitive to chance events
Give a famous example of drift in humans
Blood groups:
- Allele B of ABO, N allele of MN and Rh- allele are absent in Polynesia
- Alleles lost due to Founder effects during colonisation of islands across pacific by small groups
- Ioannidis et al., 2021 - Nature 597
How does the strength of drift change dependent on population size?
- Strength of drift is stronger on small population sizes (faster loss of variation)
- Over time, genetic drift leads to a decline in genetic variation due to a fixation/loss of alleles
- Larger the population, the longer this takes
What is the Wright-Fisher model?
- Drift effects all DNA - whether under selection or not
- Wright-Fisher model describes a population evolving from drift alone (no selection)
- Is used as null scenario for testing patterns of genetic variation
What assumptions are made for the Wright-Fisher model?
- Non-overlapping generations
- Constant size population (N individuals, 2N lineages)
- Random union of gametes (‘random mating’) - each child has 2 parents
- Sexual reproduction with all individuals hermaphrodite and able to self fertilise
- Poisson distribution for reproductive success
What is the effective population size (Ne)?
Ne is the size of the Wright-Fisher population equivalent to the real population being studied
- Is unlikely that a real population will conform exactly to the assumptions of the Wright-Fisher model
- However, these populations behave in a similar way to Wright-Fisher populations but with reduced population sizes
- Ne is always smaller than the real population size - N
How does N vs Ne change dependent on pop size?
- Drift is stronger in small pop - so genetic variation and Ne is lower in fluctuating pops compared to constant size populations with same max size (N)
- Implies individuals from populations with smaller Ne more likely share a common ancestor in the recent past
Give an example how variation in mating systems can cause deviation from Wright-Fisher assumptions?
E.g., elephant seals
- Have highly polygynous mating systems - small number of males monopolise matings with a large number of females
- This leads to a larger variation in reproductive success between individuals
- So deviates from Wright-Fisher - can have consequences for the expected amount of genetic variation in the population
What are the 4 key features of genetic drift?
- Random - unpredictable changes in allele frequencies between generations
- Dispersive force - reduces variation in populations - causes allele frequencies to diverge
- Neutral - all alleles influenced in same way
- Related inversely to Ne - drift stronger in small populations
What are the probabilities of fixation?
- 1/2N for specific allele copy
- = frequency in population for particular allelic variant
What different ways can you predict the expected amount of genetic variation in neutrally evolving populations (drift in constant sized populations)
- Wright-Fisher model and ‘forward in time’ perspective of genetic drift
- Neutral theory and infinite alleles model
- Mutation-drift equilibrium
- Molecular clocks
- Coalescent theory - ‘backwards in time’ perspective of genetic drift
- Gene genealogies
What is the decay of heterozygosity?
- Heterozygosity tending to 0 over time
- Tends to 0 faster with a smaller N (pop size)
- Ht = H0(1 - 1/2N)^t
- Decay is geometric
What is a population bottleneck and what effect does it have on genetic variation?
Is a sharp reduction in population due to an event - e.g., an earthquake/flood
- Pop size and genetic variation drops
- Pop size recovers fast
- Genetic variation recovers more slowly than population size - as only way to gain variation is through mutation
What is the infinite alleles (or sites - when referring to sequence variation) model?
- Is the case where each mutation is to a novel state
- In this case, under neutrality, a large number of alleles can be maintained in large populations
- However, when heterozygotes are the fittest genotype, a ‘genetic load’ is created due to the existence of homozygotes for less fit alleles
- Crow and Kimura - showed this creates an upper limit for the number of alleles - since selective advantage of fitter alleles is balanced out by the genetic load
- This limit appeared inconsistent with high levels of variation seen in protein variation of Drosophila - led to Kimura suggesting most mutations had to be neutral
What is the Nearly Neutral theory?
The idea that: In large pops with short generation times, noncoding DNA evolves faster while protein evolution is retarded by selection - which is more significant than drift for large pops
- Tomoko Ohta
Explain the Mutation-Drift balance?
- Mutation inputs new alleles into population
- Drift removes alleles from population
- Therefore, in neutrally evolving population, the amount of diversity will move to an equilibrium value - the magnitude of which depends on the balance of the 2 processes
- Larger populations are more likely to mutate and are less sensitive to drift - so should have greater equilibrium levels of variation than small populations in the neutral case
What is theta?
Population mutation parameter:
- Key parameter needed to estimate the level of genetic variation under neutral model
- Theta = 4Nu
What can you use to predict the amount of genetic variation that should be present in a population?
Mutation-Drift balance in the Wright-Fisher model
- Drift - decreases diversity (1/2N)
- Mutation increases diversity (2Nu) - u = mutation rate
- From infinite alleles model use 4Nu
- 4Nu = theta
What is the neutral theory of evolution?
- Mutation = new allele
- What is the probability that this new allele will become fixed?
- Mutation rate = mu (u) - probability of a new allele = 2Nu
- Probability of fixation of an allele = 1/2N
- Probability of a new allele fixing = 2Nu x 1/2N = u
Describe the molecular clock with its parameters
The hypothesis that DNA and protein sequences evolve at a constant rate over time and in different organisms
- p = rate of evolution (accumulation of mutations fixed between species)
- p = u - since we saw that the probability of fixation is equal to mutation rate
- For T1 in Species A - mutations are not substitutions but polymorphisms within species (transient entities)
- The number of mutations fixed between two species along one branch: T2u
- i.e. in neutral case, the expected number of mutations /genetic diversity along a branch is proportional to the time that separates them - so implies genetic variation is accumulating in a clock-like way where the ticks on the clock relate to the magnitude of the mutation rate
What is coalesence theory and how does it differ from the Wright-Fisher model?
- Alternative way of looking at drift - looking backwards in time
- Works out the time to the most recent common ancestor (TMRCA)
- Follow haplotypes back in time - seeing them ‘merge’ as they lose unique mutations
- Wright-Fisher model looks forward in time to predict variation in the future - but has limitations: under genetic drift, we cannot predict in which lineage/allele this will persist in future - and makes it difficult to understand what may have happened in the past. And, from an imperical standpoint - we can only collect samples from back in time - might be interested in projecting histories back in time
Why is coalesence important?
- Real world data - we only have access to contempary sequences or alleles, which form the tips of the genealogy - cant see full genealogy for every generation in past
- Coalescent allows us to reconstruct the history of surviving lineages and make inferences about the evolutionary processes which influenced them
- The coalescent provides important framework for working with sequence and other genetic data
Define the coalescent and what can you calculate?
The probability of any haplotype pair coalescing in the next (previous) generation
Can calculate:
- The probability of coalescence at a generation t in the past
- Mean time for a coalescence event to occur
- Time for all haplotypes to coalesce into a single lineage - Time to Most Recent Common Ancestor (TMRCA)
What assumptions need to be made for deriving a neutral coalescent model
- Lineages coalesce independently
- Coalescence is rare - no more than a single coalescent event per generation
How can you draw a coalescent genealogy?
- Go back T generations - combine two lineages at random - decrease k by 1 - stop if k=1, k = sequences
- If k=1, then all the lineages meet back at a common ancestor
- T(MRCA) = 4N(1 - 1/k)
- If there are a large number of lineages (k is high), then the coalescence time is ~4N - same as WF model
- However if only two lineages, avg coalescence time is 2N - i.e. half of total coalescence time is taken up by. the last coalescence event
What are the features of neutral coalescent trees?
- Very variable in shape
- Easy to simulate (not computationally intensive)
- Amenable to statistical modelling via Likelihood and Bayesian analysis
Describe the features of the neutral coalescent
- Lineages coalesce very rapidly at start
- A small sample will have high probability of containing the deepest MRCA
- Adding another sequence usually adds a short branch
- Ading a new branch does not change the total length of the tree by a factor of 1/k
- Implies that inferences can be improved by getting lots independent trees (from different genes), rather than having very large samples for a single gene
What are some uses of the coalescent?
- Mathematical modelling describing diversity in observed data - derivation of parameters for describing genetic diversity, estimates of tree shapes and population sequence parameters in the neutral case
- Simulation tool for hypothesis testing - tests for selection, changes in demography, migration and gene flow
- Rosenberg and Nordberg 2002
What is the F statistic?
Derived from Wright-Fisher model: A measure of the amount of shared co-ancestry between alleles within a population / or probability of identical by descent
- E.g., At generation 0, all allele copies are independent so none of them are identical by descent - so F=0
- This model also predicts the average time to fixation if a lineage is approximately 4N
- So F is increasing over time - similar to homozygosity