Population genetics and molecular evolution Flashcards
Population genetics is
the study of genetic diversity in biological populations, and of the processes that cause genetic diversity to change.
- Mathematically-minded based on deductive reasoning from first principles
- These fundamental principles were outlined before the discovery of DNA’s structure in 1953
Arose
- Arose from the Modern Synthesis in 1930s/40s
- Synthesised Mendelian genetics and Darwinian natural selection
- Key figures were Haldane, Fisher, and Wright
• Remarkably, the fundamental principles of population genetics were outlined before the discovery of DNA’s structure in 1953, yet have remained unchanged to this day.
why is pop genetics important
- give examples of uses
Theoretical foundation for modern evolutionary biology (underpins all phenomena in evolutionary biology)
(Although not all evolutionary questions are best studied using population genetic theory – it’s not always the right tool)
Also of great practical use - essential tool for
- Livestock & crop breeding
- Conservation biology
- Ecology
- Epidemiology
What are genetic markers
- where are they used?
- Genetic markers
- Genome regions that vary amongst the population that are useful for measuring and investigating genetic variation
- Quantity and resolution of markers has improved
- Human blood groups (1900)
- Allozymes (1966)
- Electrophoretically-distinct proteins caused by heterozygosity
- Massively-parallel pyrosequencing (2004)
Measuring genetic variation
- Method 1 & when it becomes inconvenient
- Method 2
1) Allele frequencies
- Alleles are different versions of a gene or sequence
- Diploid organisms with different alleles are at a locus are heterozygous
- A population is polymorphic at a specific gentic locus if more than one allele is commonly found (>1% or >5%)
Allele frequencies are inconvenient for measuring genetic variation when there are many alleles
2) Heterozygosity (h)
- more useful measure when lots of alleles
Example of measuring genetic variation using allele frequencies
E.g. Human Aldehyde Dehydrogenase (ALDH2)
- Found on chromosome 12, ALDH2 is involved in the breakdown of alcohol
- removes toxic acetaldehydes from the blood stream by converting it to acetic acid
DIFFERENT ALLELES
- Some Asian populations are polymorphic for ALDH2
- ALDH22 differs in ONE amino acid from ALDH21
- globally *2 is v rare, but has a high frequency in Chinese, Korean, and Japanese populations
- 43% of native Japanese carry at least one variant allele!!
WHY *2 is BAD
- one a.a difference is enough to affect behaviour of dehydrogenase enzyme
- *2 homozygotes thus metabolise acetaldehydes poorly (low dehydrogenase enzyme function)
- Leads to alcohol flush reaction, an allergic reaction to acetaldehyde
- small amount of alcohol makes asians v ill
Heterozygosity (h)
give equation
understand it lol
- h is the fraction of individuals in a population that are heterozygous
- If there are m alleles at a locus and fi is the frequency of allele i, then:
(sums up to m from i=1)
ℎ=1− ∑(𝑓𝑖)^2
- This is the same as 1 - the sum of homozygotes
- If there were 26 alleles, A-Z, then h would equal 1 – ((freq. of A)^2+ (freq of B)^2 + …. +(freq. of Z)^2)
- The frequency is squared due to the TWO alleles present in EACH individual (think hardy weinberg p^2 and q^2 are being taken away!!)
- h is equivalent to the probability that any two alleles randomly sampled from the population are different
- In other words, the probability that an offspring of a random mating is heterozygous!!
- h is greatest when there are many alleles, all at the same frequency (because squaring small numbers makes them really small and if you have lots of small numbers squared you’re taking less away from 1 so h is big!!)
How do you calculate average heterozygosity (H)
- what does H represent?
- mammal vs invertebrate H value
Average the values of h across many loci to give average heterozygosity (H)
- This represents the proportion of loci that are expected to be heterozygous in an average individual
Polymorphism at a single gene is related to heterozygosity across whole genome
- Mammals have a fairly low H,
- Invertebrates are more diverse (this is because they have really big populations because they’re small, so more average heterozygosity as higher % of loci are polymorphic)
Sequence variation
- The most common type of genetic marker investigated today is DNA sequence variation
Type? Can study…
- Long contiguous (next to each other) stretches of sequences
- OR SNPs that are known to be polymorphic are studied
Measures of sequence variation:
- Number of distinct sequences are counted along OR two measures of sequence diversity (no. differences between sequences)
Sequence diversity measure
1) Proportion of segregating sites
- Number of nucleotide sites across the samples with genetic differences (S)/ length of sequence (L)
2) Average pairwise difference (PD)
- take each pair of sequences and calculate how many sites they differ by
- PD = average of all these
- PD/L is also a useful metric (divide PD by length of sequence - ie. total no. sites)
Hardy-Weinberg background
- Before Mendel’s work was rediscovered in 1901, Darwin and others believed offspring to be blends of their parent’s characteristics
- However, such blending would quickly reduce variation amongst individuals
- First major achievement of population genetics was to explain why variation is retained through time
What did Hardy and Weinberg show?
- In the absence of any evolutionary forces, Mendelian inheritance alone can maintain genetic diversity
- It is, therefore, a null model
Like all good null models
- it is never accurate but always relevant
- ie. doesn’t actually reflect real life, but allows you to understand natural systems by how they deviate from the model
Assumptions of the HW model!!
o No selection
o No mutation
o No migration (closed pop)
o Random mating (no inbreeding)
o Infinite pop. size (genetic drift becomes negligible)
o Diploid organism with sexual reproduction
o Non-overlapping generations
o Males and females have equal allele frequencies
- If a locus has two alleles P and Q with frequencies p and q (p + q = 1), what genotype frequencies are seen after one generation of random mating?
p = frequency of dominant allele q = frequency of recessive allele
PP is homo dominant genotype
PQ hetero genotype
QQ homo recessive genotype
Freq. of PP = p x p = p^2
Freq. of PQ = (p x q) + (q x p) = 2pq
Freq. of QQ = q x q = q^2
Therefore, p^2 + 2pq + q^2 = 1
- The principle also extends to more alleles and to independently segregating loci
How to extend HW to 3 alleles (p,q,r)
More alleles can be added in the same manner, extending the equation
- E.g. p^2 + q^2 + r^2)+ 2(pq + pr + qr) = 1
The HW principle shows that, in the absence of evolutionary forces
- Genotypes frequencies are in EQUILIBRIUM, they will remain unchanged indefinitely if not disturbed
- Equilibrium genotype frequencies are created after only one generation of random mating, regardless of the genotype frequencies of the parental generation
- Thus, if genotype frequencies in a real population differ from those predicted, then at least one evolutionary force must be acting (eg. mutation, selection)
- The null model/hypothesis is rejected
If you’re given observed genotype frequencies, how do you calculate genotype frequencies predicted by HW?
observed genotype frequencies (p^2 + q^2 + r^2)
use which allele is most recessive to calculate its allele frequency (because all individuals with its genotype must be homozygous recessive)
eg. r = sqrt(r^2)
then calculate other allele frequencies
put in HW equation to see what genotype frequencies are predicted from equation given the ACTUAL allele frequencies in the population
if they’re sig different from what you’re given, can conclude selection etc is taking place
Processes that generate and modulate genetic variation within populations
- Mutation
- Natural Selection
- Recombination
- Non-random mating (related to inbreeding)
- Random genetic drift
- Population structure
Introducing these 6 population genetic processes one by one shows how they affect genetic diversity. In real life most if not all are occurring at the same time (but with this we would not be able to pick apart the relative contributions of different processes).
Mutation
- what
- how common
- relationship w genome size
- Ultimate source of all DNA sequence variation
Here we are thinking about single nucleotide change
For MOST cellular organisms, mutation is a super super slow process, very inefficient way of introducing new variants into populations and changing allele frequencies
- The spontaneous rate of nucleotide mutation ranges between 10^-9 and 10^-11 changes per base pair per generation
- viruses mutate millions of times faster
Relationship w genome size
- Clear inverse relationship between genome size and mutation rate (increasing genome size decreases mutation rate)
Other types of mutation (e.g. chromosomal change) may be more frequent
Taking mutation into account in models
- tracking the frequency of allele P through time
- introducing back mutations?
- If allele P mutates to allele Q at a rate of u (a really small number!!), with no back mutation
The fraction of alleles that change from P->Q is u
The fraction stay as allele P is 1-u
If mutation is the only evolutionary force, then the frequency of allele P after t generations (pt) is:
𝑝𝑡=𝑝0(1−𝑢)^𝑡≈ 𝑝0𝑒^−𝑢𝑡
- Suppose there is back mutation from Q to P at rate v
u (P-to-Q change) and v (Q-to-P change) - Back mutation means that neither allele P or Q will achieve fixation (i.e. be present in all individuals), but will instead reach a stable equilibrium where freq of p is
𝑝𝑒𝑞𝑚= v/𝑢+v - However, mutation is often assumed to be negligible on the time scales considered, so is often excluded from models
- Such models are obviously oversimplified (many loci have >2 alleles) but still help us to understand the evolutionary behaviour of mutations
Natural Selection
- Alleles that enhance an organism’s survival and successful reproduction in the current environment contribute disproportionally to the next generation’s gene pool
• Population genetics is interested in understanding this component - Repetition of this process leads to positive selection of beneficial alleles, leading eventually to fixation
- The current environment of a certain gene is comprises both the external world and the other genotypes that it shares the genome with
• Although natural selection acts on the whole organism, it is best understood by studying how it affects alleles at a single locus
Terminology differences between single locus population genetics and quantitative genetics
Single locus population genetics (allele focus!!)
Positive selection
Negative selection
Balancing selection
Much more relevant to Quantitative genetics (thinking about phenotypes and traits)
Directional selection
Disruptive selection
Stabilising selection
What is Positive selection ?
Used in single locus population genetics
Allele is selected for. If successful, positive mutation will be selected to fixation and remove other allele (this is in contrast to hardy Weinberg, what suggests both will persist)
What is Negative selection ?
Used in single locus population genetics
Allele is selected against. If successful, gets rid of negative allele
What is Balancing selection ?
Used in single locus population genetics
Selection that favours the co-existence of both alleles in the population
Either where
heterozygote is more fit that homozygote (Overdominant selection ‘heterozygote advantage’)
OR
where both homozygotes are more fit that heterozygote (rare, Underdominant selection ‘heterozygote disadvantage’)
Directional selection
Quantitative genetics
– average trait value increases or decreases (eg. height, weight, crop yield)
Disruptive selection
Quantitative genetics
– extreme trait values selected for
Stabilising selection
Quantitative genetics
– average trait value selected for
Relative Fitness
definition
Relative fitness is the average no. offspring produced by individuals with a particular genotype relative to the number produced by individuals with another genotype.
key parameter = selection coefficient