Population genetics and molecular evolution Flashcards

1
Q

Population genetics is

A

the study of genetic diversity in biological populations, and of the processes that cause genetic diversity to change.

  • Mathematically-minded based on deductive reasoning from first principles
  • These fundamental principles were outlined before the discovery of DNA’s structure in 1953
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Arose

A
  • Arose from the Modern Synthesis in 1930s/40s
  • Synthesised Mendelian genetics and Darwinian natural selection
  • Key figures were Haldane, Fisher, and Wright

• Remarkably, the fundamental principles of population genetics were outlined before the discovery of DNA’s structure in 1953, yet have remained unchanged to this day.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

why is pop genetics important

- give examples of uses

A

Theoretical foundation for modern evolutionary biology (underpins all phenomena in evolutionary biology)

(Although not all evolutionary questions are best studied using population genetic theory – it’s not always the right tool)

Also of great practical use - essential tool for

  • Livestock & crop breeding
  • Conservation biology
  • Ecology
  • Epidemiology
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are genetic markers

- where are they used?

A
  • Genetic markers
  • Genome regions that vary amongst the population that are useful for measuring and investigating genetic variation
  • Quantity and resolution of markers has improved
  • Human blood groups (1900)
  • Allozymes (1966)
  • Electrophoretically-distinct proteins caused by heterozygosity
  • Massively-parallel pyrosequencing (2004)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Measuring genetic variation

  • Method 1 & when it becomes inconvenient
  • Method 2
A

1) Allele frequencies
- Alleles are different versions of a gene or sequence
- Diploid organisms with different alleles are at a locus are heterozygous
- A population is polymorphic at a specific gentic locus if more than one allele is commonly found (>1% or >5%)

Allele frequencies are inconvenient for measuring genetic variation when there are many alleles

2) Heterozygosity (h)
- more useful measure when lots of alleles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Example of measuring genetic variation using allele frequencies

A

E.g. Human Aldehyde Dehydrogenase (ALDH2)

  • Found on chromosome 12, ALDH2 is involved in the breakdown of alcohol
  • removes toxic acetaldehydes from the blood stream by converting it to acetic acid

DIFFERENT ALLELES

  • Some Asian populations are polymorphic for ALDH2
  • ALDH22 differs in ONE amino acid from ALDH21
  • globally *2 is v rare, but has a high frequency in Chinese, Korean, and Japanese populations
  • 43% of native Japanese carry at least one variant allele!!

WHY *2 is BAD

  • one a.a difference is enough to affect behaviour of dehydrogenase enzyme
  • *2 homozygotes thus metabolise acetaldehydes poorly (low dehydrogenase enzyme function)
  • Leads to alcohol flush reaction, an allergic reaction to acetaldehyde
  • small amount of alcohol makes asians v ill
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Heterozygosity (h)

give equation
understand it lol

A
  • h is the fraction of individuals in a population that are heterozygous
  • If there are m alleles at a locus and fi is the frequency of allele i, then:

(sums up to m from i=1)

ℎ=1− ∑(𝑓𝑖)^2
- This is the same as 1 - the sum of homozygotes

  • If there were 26 alleles, A-Z, then h would equal 1 – ((freq. of A)^2+ (freq of B)^2 + …. +(freq. of Z)^2)
  • The frequency is squared due to the TWO alleles present in EACH individual (think hardy weinberg p^2 and q^2 are being taken away!!)
  • h is equivalent to the probability that any two alleles randomly sampled from the population are different
  • In other words, the probability that an offspring of a random mating is heterozygous!!
  • h is greatest when there are many alleles, all at the same frequency (because squaring small numbers makes them really small and if you have lots of small numbers squared you’re taking less away from 1 so h is big!!)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you calculate average heterozygosity (H)

  • what does H represent?
  • mammal vs invertebrate H value
A

Average the values of h across many loci to give average heterozygosity (H)

  • This represents the proportion of loci that are expected to be heterozygous in an average individual

Polymorphism at a single gene is related to heterozygosity across whole genome

  • Mammals have a fairly low H,
  • Invertebrates are more diverse (this is because they have really big populations because they’re small, so more average heterozygosity as higher % of loci are polymorphic)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sequence variation

A
  • The most common type of genetic marker investigated today is DNA sequence variation

Type? Can study…

  • Long contiguous (next to each other) stretches of sequences
  • OR SNPs that are known to be polymorphic are studied

Measures of sequence variation:
- Number of distinct sequences are counted along OR two measures of sequence diversity (no. differences between sequences)

Sequence diversity measure

1) Proportion of segregating sites
- Number of nucleotide sites across the samples with genetic differences (S)/ length of sequence (L)

2) Average pairwise difference (PD)
- take each pair of sequences and calculate how many sites they differ by
- PD = average of all these
- PD/L is also a useful metric (divide PD by length of sequence - ie. total no. sites)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hardy-Weinberg background

A
  • Before Mendel’s work was rediscovered in 1901, Darwin and others believed offspring to be blends of their parent’s characteristics
  • However, such blending would quickly reduce variation amongst individuals
  • First major achievement of population genetics was to explain why variation is retained through time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What did Hardy and Weinberg show?

A
  • In the absence of any evolutionary forces, Mendelian inheritance alone can maintain genetic diversity
  • It is, therefore, a null model

Like all good null models

  • it is never accurate but always relevant
  • ie. doesn’t actually reflect real life, but allows you to understand natural systems by how they deviate from the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions of the HW model!!

A

o No selection
o No mutation
o No migration (closed pop)

o Random mating (no inbreeding)
o Infinite pop. size (genetic drift becomes negligible)

o Diploid organism with sexual reproduction
o Non-overlapping generations
o Males and females have equal allele frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  • If a locus has two alleles P and Q with frequencies p and q (p + q = 1), what genotype frequencies are seen after one generation of random mating?
A
p = frequency of dominant allele
q = frequency of recessive allele

PP is homo dominant genotype
PQ hetero genotype
QQ homo recessive genotype

Freq. of PP = p x p = p^2
Freq. of PQ = (p x q) + (q x p) = 2pq
Freq. of QQ = q x q = q^2
Therefore, p^2 + 2pq + q^2 = 1

  • The principle also extends to more alleles and to independently segregating loci
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to extend HW to 3 alleles (p,q,r)

A

More alleles can be added in the same manner, extending the equation
- E.g. p^2 + q^2 + r^2)+ 2(pq + pr + qr) = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The HW principle shows that, in the absence of evolutionary forces

A
  • Genotypes frequencies are in EQUILIBRIUM, they will remain unchanged indefinitely if not disturbed
  • Equilibrium genotype frequencies are created after only one generation of random mating, regardless of the genotype frequencies of the parental generation
  • Thus, if genotype frequencies in a real population differ from those predicted, then at least one evolutionary force must be acting (eg. mutation, selection)
  • The null model/hypothesis is rejected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If you’re given observed genotype frequencies, how do you calculate genotype frequencies predicted by HW?

A

observed genotype frequencies (p^2 + q^2 + r^2)

use which allele is most recessive to calculate its allele frequency (because all individuals with its genotype must be homozygous recessive)
eg. r = sqrt(r^2)

then calculate other allele frequencies

put in HW equation to see what genotype frequencies are predicted from equation given the ACTUAL allele frequencies in the population

if they’re sig different from what you’re given, can conclude selection etc is taking place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Processes that generate and modulate genetic variation within populations

A
  • Mutation
  • Natural Selection
  • Recombination
  • Non-random mating (related to inbreeding)
  • Random genetic drift
  • Population structure

Introducing these 6 population genetic processes one by one shows how they affect genetic diversity. In real life most if not all are occurring at the same time (but with this we would not be able to pick apart the relative contributions of different processes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Mutation

  • what
  • how common
  • relationship w genome size
A
  • Ultimate source of all DNA sequence variation

Here we are thinking about single nucleotide change

For MOST cellular organisms, mutation is a super super slow process, very inefficient way of introducing new variants into populations and changing allele frequencies

  • The spontaneous rate of nucleotide mutation ranges between 10^-9 and 10^-11 changes per base pair per generation
  • viruses mutate millions of times faster

Relationship w genome size
- Clear inverse relationship between genome size and mutation rate (increasing genome size decreases mutation rate)

Other types of mutation (e.g. chromosomal change) may be more frequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Taking mutation into account in models

  • tracking the frequency of allele P through time
  • introducing back mutations?
A
  • If allele P mutates to allele Q at a rate of u (a really small number!!), with no back mutation

The fraction of alleles that change from P->Q is u
The fraction stay as allele P is 1-u

If mutation is the only evolutionary force, then the frequency of allele P after t generations (pt) is:
𝑝𝑡=𝑝0(1−𝑢)^𝑡≈ 𝑝0𝑒^−𝑢𝑡

  • Suppose there is back mutation from Q to P at rate v
    u (P-to-Q change) and v (Q-to-P change)
  • Back mutation means that neither allele P or Q will achieve fixation (i.e. be present in all individuals), but will instead reach a stable equilibrium where freq of p is
    𝑝𝑒𝑞𝑚= v/𝑢+v
  • However, mutation is often assumed to be negligible on the time scales considered, so is often excluded from models
  • Such models are obviously oversimplified (many loci have >2 alleles) but still help us to understand the evolutionary behaviour of mutations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Natural Selection

A
  • Alleles that enhance an organism’s survival and successful reproduction in the current environment contribute disproportionally to the next generation’s gene pool
    • Population genetics is interested in understanding this component
  • Repetition of this process leads to positive selection of beneficial alleles, leading eventually to fixation
  • The current environment of a certain gene is comprises both the external world and the other genotypes that it shares the genome with

• Although natural selection acts on the whole organism, it is best understood by studying how it affects alleles at a single locus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Terminology differences between single locus population genetics and quantitative genetics

A

Single locus population genetics (allele focus!!)
Positive selection
Negative selection
Balancing selection

Much more relevant to Quantitative genetics (thinking about phenotypes and traits)
Directional selection
Disruptive selection
Stabilising selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Positive selection ?

A

Used in single locus population genetics

Allele is selected for. If successful, positive mutation will be selected to fixation and remove other allele (this is in contrast to hardy Weinberg, what suggests both will persist)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Negative selection ?

A

Used in single locus population genetics

Allele is selected against. If successful, gets rid of negative allele

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Balancing selection ?

A

Used in single locus population genetics

Selection that favours the co-existence of both alleles in the population

Either where

heterozygote is more fit that homozygote (Overdominant selection ‘heterozygote advantage’)

OR

where both homozygotes are more fit that heterozygote (rare, Underdominant selection ‘heterozygote disadvantage’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Directional selection

A

Quantitative genetics

– average trait value increases or decreases (eg. height, weight, crop yield)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Disruptive selection

A

Quantitative genetics

– extreme trait values selected for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Stabilising selection

A

Quantitative genetics

– average trait value selected for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Relative Fitness

definition

A

Relative fitness is the average no. offspring produced by individuals with a particular genotype relative to the number produced by individuals with another genotype.

key parameter = selection coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Selection coefficient (s)

A

In population genetics, the fitness of a new allele is expressed as a selection coefficient (s)

s is the fractional increase or decrease in relative fitness given by a new allele

s = 0.12 means a new allele is 12% more fit  ie. slightly advantageous mutation
s = -0.04 means a new allele is 4% less fit
s = 0 means a new allele has the same fitness as the previous one.

For simplicity, one genotype is given a fitness of 1.0.
This means each new allele has the fitness 1+s

30
Q

Selection in haploids

Change in allele frequency between generations of haploids when you introduce a beneficial new allele (Q)

  • when is frequency of change fastest?
  • what does plot of allele frequency against time look like?
A

Suppose an allele Q has the frequency q and the fitness 1+s relative to allele P, which has the frequency p.

The change in q is, therefore:
Δ𝑞≈𝑠𝑝𝑞

q increases when s is positive and decreases when negative

Frequency change is fastest when

  • s values are bigger (ie. if new allele is really advantageous)
  • when p = q = 0.5 (as max of pq = 0.25), and freq change is slowest when either allele P OR Q are rare

This means
Plots of allele frequency against time are SIGMOIDAL (rare, middle, non-rare) (slow, fast, slow)

31
Q

Evidence in nature that allele freq. has sigmoidal behaviour

A
  • Gene in influenza called haemagglutinin gene (encodes glycoprotein on surface of viral envelope that allows attachment to host cell membrane)
  • Positive selection in the haemagglutinin (HA) gene of human influenza virus is common and very rapid
  • Fixation of new amino acid changes allow the virus to escape host antibodies (antigenic variation!!)
  • New allele frequency is remarkably sigmoidal in nature (fits model!) and remarkably fast.
  • Other amino acid changes fall in frequency before reaching fixation as s value is not high enough
32
Q

Selection in diploids

general formula for fitness in positive selection with new allele Q:

  • what does h mean?
  • example with codominance, recessive effect, dominant effect
A
  • Diploid selection more complex as have to define the fitness of the 2 alleles and the fitness of the 3 genotypes (homo, homo, hetero)
  • There are many different ways in which you ascribe selection coefficients to the 3 different genotypes
  • Can be simplified by introducing h
  • h is the DEGREE OF DOMINANCE (degree to which beneficial phenotype is shown), ranges from zero to 1 (0.5 = codom effect, 1= dom effect, but can have any value within that range)

The general formula for fitness in positive selection with new allele Q:

  • PP: 1 PQ: 1+hs QQ: 1+s
  • h is the degree of dominance with 0 ≤ h ≤ 1

Eg. Co-dominant is intermediate between dominant and recessive, so fitness may be half way between the two PP: 1 PQ: 1+0.5s QQ: 1+s

Eg. Recessive effect, h=0

  • PP: 1 PQ: 1 QQ: 1+s
  • P is dominant over new allele Q therefore h is low, leading to Q allele being obscured in the heterozygote

Dominant effect

  • PP: 1 PQ: 1+s QQ: 1+s
  • Q is dominant therefore h is high, the Q allele is thus expressed
33
Q

Selection in diploids

Change in allele frequency between generations of diploids when you introduce a beneficial new allele (Q)

A

• Standard haploid selection process modified to show what’s going on in dominance effect

Due to the degree of dominance (h) being a contributing factor in diploids, change in beneficial new allele Q frequency between generations of diploids is:
Δ𝑞≈𝑠𝑝𝑞 (𝑝ℎ+𝑞(1−ℎ))

• Look at diff values of h to see what it does to equation

Recessive effect, h = 0
Δ𝑞=𝑝𝑞^2𝑠
- New recessive alleles create heterozygotes, so is maintained at low frequencies
- Once enough heterozygotes are in the population, recessive homozygotes begin to appear and the allele quickly increases in frequency

Dominant effect, h = 1
Δ𝑞= 𝑝^2𝑞𝑠
- New dominant alleles increase in frequency very quickly
- The dominance of the new allele allows for less-fit recessive alleles to ‘hide’, preventing fixation

Codominance, h = 0.5
Δ𝑞=𝑠𝑝𝑞/2
- New codominant alleles achieve higher frequencies than dominant alleles because there is no ‘hiding’ due to the additive effect of the alleles

• Making h=0.5 (codominant selection) is almost like making pop pseudo haploid, by removing the effect of heterozygotes
codominant selection and haploid selection v similar

34
Q

What’s the recessive effect?

A

Old allele P is dominant over new allele Q therefore h is low, leading to Q allele being obscured in the heterozygote. Only genotype that is selected for is homozygous QQ

PP: 1 PQ: 1 QQ: 1+s
Δ𝑞=𝑝𝑞^2𝑠

  • New recessive alleles create heterozygotes, so is maintained at low frequencies
  • This is because when beneficial recessive allele Q is rare, it is only found in heterozygotes and therefore ‘invisible’ to selection. Selection can’t get any traction – incredibly slow process to create handful of recessive QQ homozygotes, when this happens selection can kick off
  • allele frequency rapidly increases and reaches fixation
35
Q

What’s the dominant effect?

A

New allele Q is dominant over old allele P and therefore h is high, leading to allele Q being expressed in the heterozygote

PP: 1 PQ: 1+s QQ: 1+s
Δ𝑞= 𝑝^2𝑞𝑠

  • Selection can act very quickly, new dominant alleles initially increase in frequency generating lots of QQ homozygotes.
  • The dominance of the new allele allows for less-fit recessive alleles to ‘hide’, preventing complete fixation!! (as the hetero is still beneficial so not selected against)
    • Change is very slow as it nears fixation (sigmoidal!)
  • At end, still heterozygotes around, v difficult to get rid of as they still have the same beneficial phenotype as homo dom.
36
Q

What is the codominant effect?

A

Co-dominant is intermediate between dominant and recessive, so fitness may be half way between the two
h = 0.5

PP: 1 PQ: 1+0.5s QQ: 1+s

Δ𝑞=𝑠𝑝𝑞/2
- New codominant alleles achieve higher frequencies than dominant alleles because there is no ‘hiding’ due to the additive effect of the alleles

Making h=0.5 (codominant selection) is almost like making pop pseudo haploid, by removing the effect of heterozygotes.

Codominant selection and haploid selection v similar (no squared term in either!!)

• A new, rare codominant allele Q initially creates many heterozygotes.
- Q achieves higher frequencies than dominant alleles because there is no ‘hiding’ due to the additive effect of the alleles!

37
Q

Frequency of beneficial new allele Q over no. generations

How do the recessive effect, dominant effect and codominant affect look on graph?

A

Recessive effect
- still sigmoidal, but takes 100s of generations to see change in allele frequency
- really low on graph then shoots up and reaches fixation
This is because when beneficial recessive allele Q is rare, it is only found in heterozygotes and therefore ‘invisible’ to selection. Selection can’t get any traction – incredibly slow process to create handful of recessive QQ homozygotes, when this happens selection can kick off
- allele frequency rapidly increases and reaches fixation

Dominant effect
- Selection can act very quickly, new dominant alleles initially increase in frequency generating lots of QQ homozygotes. Curve shoots up to v high allele frequency really fast
- The dominance of the new allele allows for less-fit recessive alleles to ‘hide’, preventing complete fixation!! (as the hetero is still beneficial so not selected against)
• Near fixation, dominance allows the less-fit allele to ‘hide’ in heterozygotes, making it difficult to remove as they still have the same beneficial phenotype as homo dom

Codominant effect
- partly on one side of dom curve and partly on other
- slightly slower than dominant to shoot up in freq, but then goes above dom curve to reach fixation faster
• A new, rare codominant allele Q initially creates many heterozygotes.
- Q achieves higher frequencies than dominant alleles because there is no ‘hiding’ due to the additive effect of the alleles!

38
Q

Balancing selection

Underdominance

A

Balancing selection favours the co-existence of both alleles in the population

Both homozygotes are more fit that heterozygote ‘heterozygote disadvantage’

PP: 1+s PQ:1 QQ: 1+t

39
Q

Balancing selection

Overdominance

A

Balancing selection favours the co-existence of both alleles in the population

Heterozygote fitness is greater than homozygote ‘heterozygote advantage’

PP: 1-s PQ: 1 QQ: 1-t

  • Leads to excess of heterozygotes compared to the Hardy-Weinberg prediction. Heterozygote freq is way bigger than 2pq
  • Allele P and Q will stably coexist in proportion to the relative fitnesses of the two homozygotes

frequency of allele P = t/(s+t), i.e. in proportion to the relative fitnesses of the two homozygotes

40
Q

Example of balancing selection overdominance

A

Sickle cell anaemia
• In its heterozygous form the sickle cell anaemia allele confers some protection to malaria, but the homozygote causes serious illness. Hence the allele is stably maintained in malaria endemic regions
• Being heterozygous is beneficial!

41
Q

Name 2 other types of selection that can maintain genetic variation in a population

(apply to haploid and diploid)

A

Frequency-dependent selection: allele fitness is high when the allele is rare, low when common

eg. influenza mutations! If novel, can escape immune system. If common, we will have antibodies against them. This also works in predator-prey and host-parasite systems.

Fluctuating selection: allele fitness depends on an aspect of the environment that is rapidly and constantly changing.

42
Q

Example of balancing selection underdominance

A

Examples of underdominance are hard to find

False Wanderer butterfly

  • species has two alleles that confer a likenesses of a toxic species
  • Heterozygotes have an intermediate appearance and a lower fitness
43
Q

Mutation-selection balance

Diploid, If allele Q is strongly deleterious

what’s the rate of change of q?

What’s qeqm if it is recessive/dominant?

A

These two process can combine to determine how common harmful mutations are in a population

Diploid, If allele Q is strongly deleterious and u is the P-to-Q mutation rate then:

The rate of change of q = selective process + (mutation rate x no. Ps in population that can mutate to Qs)
Δ𝑞≈𝑠𝑝𝑞 (𝑝ℎ+𝑞(1−ℎ))+𝑝𝑢

  • h = dominance effect
  • s = selection coefficient

The equilibrium frequency of allele Q (𝑞𝑒𝑞𝑚) can be calculated by making some approximations - setting ∆q=0 and noting that p≈1 (we are interested in v deleterious mutations) because Q is harmful

RECESSIVE
If Q is recessive ie. degree of dominance h=0, then

𝑞𝑒𝑞𝑚≈√𝑢/|𝑠|

This allows the strongly deleterious mutation to hide out in heterozygotes. Selection does a v bad job of removing the mutant qs.

DOMINANT
• If Q exhibits just a small amount of dominance (h>0), then

𝑞𝑒𝑞𝑚 ≈ 𝑢/ℎ|𝑠|

(ie. take away the root and add an h)
• Just a fraction of the heterozygotes show a strong deleterious effect. This allows selection to very quickly clear the deleterious allele, and the equilibrium frequency of the mutation is a lot lower
• Thus raising the dominance of an harmful allele very slightly leads to a significant drop in the frequency of the allele due to it being revealed to selection

44
Q

Recombination

  • what is it?
  • why did it evolve?
  • how does it allow us to think about evolutionary processes?
A
  • Most multicellular, many unicellular organisms, and some viruses undergo some kind of genetic exchange (transformation, transduction, mitosis, meiosis etc.).
  • Recombination creates new combinations of genetic diversity (rather than generating new mutations per se), including combinations very unlikely to be generated by mutation and selection alone
  • It may have evolved to help organisms ameliorate effects of deleterious mutations
  • Recombination allows us to think about evolutionary processes at more than 1 locus at once
  • The likelihood that any given allele is inherited is reduced by half
45
Q

Linkage disequilibrium (LD)

A

Linkage disequilibrium (LD) is used to study recombination. The deviation (D) from the expected allele frequencies!!

  • Reflects the non-random association between the alleles at two loci, and thus the extent to which the genome is recombining

If you can make a good prediction of one locus by knowing about another locus, they are linked by LD?

  • Suppose at one locus there are alleles A and B and alleles P and Q at another locus, there are four possible haplotypes (combinations of alleles) from this
  • PA, PB, QA and QB

We define the frequencies of the haplotypes in a population as fpa, fpb, fqa, fqb such that they add to 1 (fpa+fpb+fqa+ fqb=1)

UNLINKED ie. expected
If the alleles at the two loci are randomly associated and freely recombining, then knowing what happens at one locus tells us nothing about the inheritance at other loci, then
(𝑓𝑃𝐴 × 𝑓𝑄𝐵) = (𝑓𝑄𝐴 × 𝑓𝑃𝐵)

LINKED
Linkage disequilibrium can be measured as the deviation (D) from the above expectation of total mendelian inheritance
𝐷= (𝑓𝑃𝐴 × 𝑓𝑄𝐵)−(𝑓𝑄𝐴 × 𝑓𝑃𝐵)

  • Low D means that there are more areas of the genome recombining and thus leading to allele frequency changes

Genes on different chromosomes will usually have D ≈ 0, because of independent segregation at meiosis

46
Q

Why is D not a very useful metric? What’s better?

A
  • D is dependent on allele frequencies so can’t be compared across studies (diff organisms, chromosomes etc)
  • Other metrics like the normalised D’ that takes allele freq into account can be more useful
47
Q

Population genetic forces that increase D

A
Linkage disequilibrium can be raised by 
- recent positive selection
- population bottlenecks
- population subdivision
- balancing selection
- epistasis
(these processes all make it more likely that certain combination of alleles are found than is expected by chance)
48
Q

Population genetic force that decreases D

A

Recombination

It breaks apart combinations of alleles, and is pretty much the only evolutionary process that can do this.

In the absence of any other evolutionary force, D would be reduced at rate defined by the rate of recombination (crossovers per nucleotide per generation)

49
Q

Human genetics

- importance of LD?

A
  • LD is important for finding mutations that cause genetic diseases, it is possible to locate disease-causing mutations through their strong LD with nearby genetic markers eg. SNP
50
Q
Human genetics
Haplotype blocks
- what is it
- why
-  human genome prevalence
A

Haplotype blocks - blocks of the genome are likely to be inherited together, where recombination is rare (hence LD is high)

Why?
Crossing over is far more likely to occur in some parts of the genome than others

Human genome

  • has discrete haplotype blocks within which recombination is rare, separated by hot spots of recombination
  • Each block is descended from a single ancestral chromosome
  • LD differs between populations
  • Nigerian population has low LD and shorter haplotype blocks (~5kb)
  • US Caucasian population has high LD and long haplotype blocks (~60kb)
  • European populations (and descendant US populations) have been through recent strong ‘out of africa’ population bottleneck, which raised LD. Recombination has yet to lower LD to levels seen in Africa
51
Q

Recombination and Selection

  • What happens if these evolutionary forces both occur at the same time?
  • what graph shows this?
A

The interaction between recombination and selection leads to complex evolutionary phenomena.

Shown by graph of freq. of haplotypes over time

52
Q

Without recombination (asexual populations)

freq. of haplotypes over time

two advantageous alleles A and B

A

In the absence of recombination beneficial mutations compete with each other for fixation (clonal interference) and are hindered by being linked to deleterious mutations

  • Big A and Big B are advantageous, and both are trying to increase in frequency.
  • AB is the most fit haplotype
53
Q

With recombination (sexual populations)

freq. of haplotypes over time

two advantageous alleles A and B

A

Increases the efficiency of fixation of new alleles by natural selection

  • Both carriers of mutations A and B have a higher fitness and therefore a bigger chance to survive and to produce offspring.
    • Recombination of aB and Ab can regenerate AB haplotype (don’t have to wait for each to fix). This happens a lot faster than having to fix one mutation at a time.

Recombination can

  • avoid clonal interference to give the most beneficial combinations quickly!
  • combine competing mutations into a single haplotype, and also free them of freeloading deleterious mutations (by freeing a beneficial allele from being tied to something really deleterious on a chromosome!)
54
Q

Define clonal interference

- what does it explain?

A

Clonal interference occurs in an asexual lineage (“clone”) with a beneficial mutation.

This mutation would be likely to get fixed if it occurred alone, but it may fail to be fixed, or even be lost, if another beneficial-mutation lineage arises in the same population; the multiple clones interfere with each other.

It explains why beneficial mutations can take a long time to get fixated or even disappear in asexually reproducing populations.

55
Q

Non-random mating

  • what’s it with respect to
  • an example
A
  • Random mating is defined as random with respect to GENOTYPE.
  • It is not the same as no mate choice
  • In non-random mating, organisms may prefer to mate with others of the same genotype (positive assortative) or of different genotypes (negative assortative).
  • Non-random mating won’t by itself make allele frequencies change.

Eg. human mating seems to be random with respect to most genetic markers, but is non-random with respect to phenotypic traits such as height. Ie. you might choose to mate with someone who’s tall, but the markers associated with height are randomly inherited

56
Q

Inbreeding

  • what is it a type of
  • what loci does it affect
A

Inbreeding is a type of non-random mating that occurs when individuals mate with relatives more often than expected by chance

• Affects all loci in a genome of an organism and may be an adaptive trait. Eg. self-fertilisation in plants enables isolated individuals to reproduce.

57
Q

Positive assortative mating

  • what is it
  • what loci does it affect
A

type of non-random mating

• Positive assortative mating occurs when individuals choose mates with similar traits to themselves.

  • This is not inbreeding - individuals might be more closely related, but only by chance
  • It affects only a subset of genes of an organism which affect chosen trait rather than whole genome (as in inbreeding!)

e.g. brown birds may only mate with other brown birds, but this will only affect colour genes

58
Q

Affect of inbreeding and positive assortative mating on allele frequencies
- what’s IBD?

A

Do not change allele frequencies, but the relative degree to which these alleles are apportioned in homozygotes and heterozygotes does.

They increase in homozygosity (compared to that predicted by the Hardy-Weinberg principle)

Identical by descent (IBD)

  • because closely related parents are likely to have the same allele eg. aa
  • the alleles are said to be identical by descent (IBD) (ie. shared in common ancestor)
  • locus is said to be autozygous (rather than homozygous)
59
Q

Affect of outbreeding and negative assortative mating on allele frequencies

A

Neg assortative mating is a preference for mates with different traits to your own

Both will increase heterozygosity and decrease homozygosity.

60
Q

Compared to HW predictions

excess of heteros means

excess of homos means

A

excess of heteros means

  • Overdominant balancing selection, ‘heterozygote advantage’
  • outbreeding
  • negative assortative mating

excess of homos means

  • Underdominant balancing selection, ‘heterozygote disadvantage’
  • inbreeding
  • positive assortative mating
  • genetic drift
  • pop bottlenecks (v strong GD)
61
Q

Inbreeding coefficient (F)

  • what F means random mating? complete inbreeding? full-sib mating?
A

The inbreeding coefficient (F) is used to measure the level of recent inbreeding

  • It equals the probability that the alleles of a randomly-chosen locus are IBD (identical by descent)

In the absence of any other evolutionary process:
F = 0 signifies random mating, genotype frequencies are determined by HW principle

F = 1 signifies complete inbreeding, there are zero heterozygotes

  • Full sib mating F = 0.25
  • Frequency of homozygous recessives among offspring is:
  • 𝑓= 𝑞2(1−𝐹)+𝑞𝐹
62
Q

Inbreeding depression

- why bad?

A

Reduced biological fitness in a given population as a result of inbreeding, or breeding of related individuals

Harmful as it gives increased frequency of homozygotes

  • this leads to phenotypic expression of deleterious recessive alleles!!
  • These alleles persist in outbred populations in heterozygotes e.g. through mutation-selection balance
  • Frequency of homozygous recessives among offspring is:
    𝑓= 𝑞^2(1−𝐹)+𝑞𝐹
63
Q

Random genetic drift

A
  • HW principle assumes infinite population size and therefore sampling affect is absent (drift not taken into account)
  • Chance events can leads to random fluctuations in allele frequencies ‘drift
  • change due to “sampling effect” in selecting the alleles for the next generation from the gene pool of the current generation.
  • Although genetic drift happens in populations of all sizes, its effects tend to be stronger in small populations because there’s only a few offspring so random fluctuations in allele frequencies occur
  • Fluctuations due to drift combine through time, eventually leading to fixation or elimination of alleles (even in the absence of selection!)
  • Genetic drift always has the net effect of reducing genetic variation and increasing homozygosity. (although it is slower than nat sel)
  • Drift is most important for neutral alleles (s = 0 )i.e. genetic variants that confer no fitness benefit or disadvantage compared to the WT (s=0). There is no selection acting on the alleles, so random genetic drift is key in explaining how they vary from gen to gen
64
Q

Fixation probability and how small populations experience genetic drift

A
  • A small population of N diploid individuals contains 2N* alleles (at autosomal loci)
  • Some individuals will leave no offspring, some 1, some 2, and this variation in reproductive output is not inherited (ie. not to do with genes)
  • Mutation produces new (blue) allele. After 5/6 generations, entire population has undergone fixation for blue allele through random genetic drift.
  • All new mutations start with frequency of ½N

• We can use maths to figure out the probability of fixation of the new neutral allele
- this is also ½N (its starting frequency)

  • The average time to fixation is ≈4N generations (quite slow compared to nat sel)
  • Therefore, the effects of genetic drift are inversely proportional to population size
  • the smaller the population, the more likely fixation will occur
65
Q

The balance between genetic drift and mutation (to give neutral fitness alleles)

A
  • In the absence of natural selection, the levels of genetic variation in a population is determined by the balance between mutation (which generates variation) and random genetic drift, which removes variation (by either fixation or elimination of alleles)

In diploids this is will lead to an equilibrium level of average heterozygosity (H) that results as a combination of mutation and genetic drift which is:

H=4Nu/(4Nu+1)

Heterozygosity of neutral alleles depends only on population size (N) and strength of mutation (u), and will be close to 1 when 4Nu is large (because pop sizes are big so genetic drift is low and mut rate is high, so lots of variation)

  • 4Nu (or 2Nu for haploids) recurs throughout population genetic theory as a measure of genetic variation, it is often denoted Θ
  • Drift also affects advantageous alleles, even tiny random fluctuations can push rare alleles to extinction, thus even in large populations that most beneficial mutations are lost

Thus the heterozygosity of neutral alleles depends only on population size (N) and mutation rate (u), and will be close to one when 4Nu is large.

66
Q

The term 4Nu (ϴ)

A

Fundamental parameter that takes into account the relative strength of genetic drift

4Nu (or 2Nu for haploids) recurs throughout population genetic theory as a measure of genetic variation (often denoted ϴ).

67
Q

The balance between genetic drift and selection

A
  • When alleles are rare (e.g. recently created) then even tiny random fluctuations can push them to extinction
  • Thus even in large populations most new beneficial mutations are quickly lost (only a tiny bit of GD might push it to extinction)
  • However, beneficial alleles are more likely to be fixed and will achieve fixation faster than neutral ones - if it gets above a certain boundary its highly likely to reach fixation
  • Probability of fixation of a new mutation (which starts at ½N) is a function of two parameters (N = size of pop, s = selection coefficient (fitness of allele))
  • 𝑝𝑟𝑜𝑏.𝑜𝑓 𝑓𝑖𝑥𝑎𝑡𝑖𝑜𝑛=1−𝑒^−2𝑠/1−𝑒^−4𝑁𝑠
  • When Ns»1 or Ns<
68
Q

Effective population size (Ne)

A

So far assumed in all models that…
- all individuals in all generations have an equal propensity to reproduce.
And that N = absolute no. individuals that can reproduce.

Often not the case (variation in fecundity due to harems, unequal sex ratios, fluctuations in pop size over time)

Thus in population genetics effective population size (Ne) is often used in place of census population size (N). Typically Ne<

69
Q

Name 3 reasons why not all individuals in a pop have equal propensity to reproduce

A
  • Individuals vary greatly in fecundity (e.g. harem-less male gorillas)
  • Unequal sex ratios (therefore diff contribution to next gen)
  • Fluctuations in populations size through time, so some individuals more fecund than others
70
Q

What is the Wahlund effect?

A

An apparent paucity in the frequency of heterozygotes (compared to Hardy Weinberg equilibirum) due to population structure, even if each subpopulation itself is in HW equilibirum.

If the allele frequencies differ between pop A and pop B, then heterozygosity in each will always be lower than if they were well-mixed