Selection Flashcards
Name the key features of selection (5)
- Non-random
- Directional
- Determined by gene pool and environment
- Dependent on heterozygous effect and allele frequency
- Increases, decreases or stabillizes the frequency of a specific allele
What does selection drive / affect?
- Drives adaptation
- Affects specific loci/gene sequences - (said to be under selection pressure)
What does selection act upon?
- Probability of survival until adulthood
- Number of offspring - (lifetime reproductive success)
What is selection?
Process that selects specific alleles on individuals to change the inheritance pattern of genes
What different stages of the life cycle can selection occur?
- Viability selection: from fertilization to reproduction
- Sexual selection: choosing mates - selection in both sexes
- Fertility and gamete selection: ability to fertilize
- Fecundity selection: number of offspring
How is selection measured?
‘fitness’ - differences among genotypes
What are absolute and relative fitness?
- Absolute fitness: average number of offspring in next generation per individual or specified genotype born in this generation
- Relative fitness (W): absolute fitness of specified genotype / absolute fitness of reference genotype
What is the selection coefficient (s)?
The relative reproductive disadvantage a genotype has against the most fit genotype
- 1-relative fitness
What are the different types of selection, how do they affect alleles and how common are they?
- Positive selection: Increases the frequency of a given beneficial allele (adaptation) - less common
- Balancing selection: Maintains allele frequencies at an equilibrium - rare
- Negative (purifying) selection: Decreases the frequency of a given deleterious allele - common
What factors does the intensity of selection relate to?
- Presence of other alleles
- Environmental conditions
- Heterozygous effect
- Frequency of the allele
- Effective population size / genetic drift
- Linkage to other loci - Hill-Robertson effect
How can you model selection at one/multiple loci and what theories were adopted?
- Simplest model - one locus, two alleles, fitness difference between them
- More complicated - multiple alleles, multiple loci etc
- Use selection coefficient (s) - defined so that lowest fitness genotype has fitness 1-s
- Diploids: viability selection acts on different genotypes, and outcomes will depend on relative fitness differences between the diff homo/heterozygous genotypes
- Theory developed by Wright, Fisher and Haldane
How is the mean fitness of a population calculated and what is its symbol?
- p^2 and q^2 = homozygous, 2pq = heterozygous
- p^2 + 2pq + q^2 = 1
- w = omega
- w11 + w12 + w22 = mean w
- mean w is mean fitness of pop
What is relative fitness?
Relative fitness is the absolute fitness normalized - e.g., absolute fitness of each genotype divided by the absolute fitness of the fittest genotype
s = selection coefficient (0<s<1)
h = heterozygous effect
- A1A1 = 1
- A1A2 = 1 - hs
- A2A2 = 1-s
What is the heterozygous effect?
The measure of fitness of the heterozygous relative to the selective difference between the two homozygotes
What happens during positive selection and how does it differ for dominant / recessive alleles?
- Fittest allele pushed to fixation (reaches frequency of 1)
- Dominance and magnitude of fitness difference affects the speed of fixation of a beneficial allele
- Deterministic model - i.e. no drift
- Common
- ‘A’ - Dominant : selection acts straight away - pushed to fixation
- Intermediate dominance - not straight away but still quite quickly
- ‘A’ - Recessive : selection doesnt act straight away but increases quickly later on
What happens during purifying (negative) selection and how does it affect alleles?
- Without drift (deterministic model) - new deleterious alleles under purifying selection drop out immediately
- But amount of deleterious variation in a population will approach mutation-selection equilibrium as the loss of deleterious alleles due to selection is balanced by the input of new mutations
- With drift - dominance affects the probability of loss of a deleterious allele
- Recessive/partially recessive deleterious alleles can spread to an appreciable frequency in the population by drift - e.g., Tay-Sachs disease in some Ashkenazi Jewish enclaves)
- Estimated between 2 and 20 deleterious mutations in human per generation - many accumulated deleterious mutations - high mutational loads which is exposed when inbreeding occurs
How do positive and negative selection affect fitness difference and genetic variation?
- Both remove fitness differences - reduce genetic variation
- Makes two alleles the same in the end - no removes fitness difference
How does balancing selection work?
- Frequency of fittest allele maintained at an equilibrium rather than going to fixation under positive selection
- Is due to heterozygous advantage (heterozygote has highest fitness)
- Two or more alleles maintained in the population]
- Is rare
- Occurs when h<0 - over dominance
What does the equilibrium frequency of the genotypes and time taken to reach equilibrium depend on?
The fitness difference between the heterozygote and homozygous genotypes
What is balanced variation?
When selection actively maintains variation: Some types of balancing selection are:
- Heterozygote advantage - e.g., sickle cell anaemia, maybe CF
- Negative Frequency dependent selection - fitness of allele depends on its frequency in population - allele has its highest fitness when its rare in the population - as frequency increases - fitness decreases - e.g., host-parasite interactions - MHC - cyclicle shape
Give an example of negative frequency dependent selection
E.g., coevolution of hosts and parasites:
- Parasite wants to be adapted to most common genotype in population - to infect most hosts
- So hosts genotypes that are infrequent will become more and more selected for but the parasite will be constantly trying to catch up with the hosts change in selection
- See graph in lecture
How are malaria and HbS alleles distributed worldwide?
- Sickle cell anaemia allele - HbS
- Homozygous for HbS allele have very low fitness - poor health outcomes - mortality at early age - exacerbated by malaria - allow homozygous HbS allele to persist
- Heterozygous genotypes offer protection against malaria, so HbS maintained in malarial areas
- Selection against HbS outside malarial areas
- Papers in lecture
What is disruptive selection?
This form of selection causes allele to go to fixation or to be lost from population - depending on starting frequency of allele
- Rare - but important for sympatric selection
- Direction of selection depends on the initial frequency of the allele - if low freq = lost, if high freq = fixation
- Occurss in h>1, underdominance - heterozygotes have lowest fitness
How can environment effect selection? Give an example
- One allele is fitter in one environment and deleterious in another, then selection may maintain genetic variation across different habitat patches, or allele frequencies may change through time if environmental conditions change
- Selection coefs are therefore always relared to a given population within some specific environmental conditions
- Example: Peppered moth - Biston betularia
- Here: light moths are cryptic (cant be seen) on lichen covered trees but easily observed on trees where lichens have disappeared due to pollution, and vise versa for dark melanistic moth
- Observability effects predation rates by birds - so light forms are rarer in areas where trees have lost lichen cover
What different ways can you test selection, and what tests are used for each of these methods?
- Frequency/distribution tests (mainly comparisons of theta) - Tajima’s D test, Fu and Li statistic
- Haplotype diversity - mismatch distribution
- Haplotype length based tests - linkage disequilibrium - extended haplotype homozygosity test
- Codon-based tests - dN/dS rations, MK test, HKA test
How does Tajima’s D statistic work?
Tajima’s D is the difference between two two estimators of genetic diversity (theta):
- The average pairwise differences (PI) in a sample
- The number of segregating sites (Sn) divided by An
These are scaled so that they are expected to be the same in a neutrally evolving population of constant size
What does the Tajimas D statistic tell you about variants?
- If a sample has an excess of rare variants - theta > PI and so D<0 - this suggests positive selection or population growth
- If a sample has an excess of intermediate frequency variants - PI > theta and D>0 - suggests balancing selection or population subdivision
How does the Fu and Li statistic work?
A comparison of the number of derived singleton mutations and the total number of derived nucleotide variants
- Similar concept to Tajimas D
- Uses assumption that the expected number of derived mutations that are present only once in a sample (singletons), n is equal to theta in the neutral case
What do the values of the Fu and Li statistic suggest about the population?
- Negative value indicates excess of singletons -similar to tajimas D - since selective sweeps tend to generate an excess of singletons
- A positive value indicates lack of singletons - e.g., balancing selection
- More sensitive than Tajimas D in the genetic sweep scenario
How can you compare haplotype number and diversity, describe the extreme cases?
- Comparison of the number of expected haplotypes and the observed number of haplotypes given the number of segregating sites
Extremes: - Two or few haplotypes; less haplotypes than number of segregating sites - balancing selection or population structure
- Each segregating site defines a new haplotype - selective sweeps or population growth
When is haplotype diversity high and low?
- High under balancing selection
- Low under selective sweeps - (or population growth)
How can you use the mismatch distribution to show selection and how are the trees affected?
- Spikey distribution = balancing selection or population structure
- Modal distribution - possible positive selection or pop growth
- Positive selection changes tree shape more than negative selection
- Negative selection mostly removes individual terminal branches
- Effect of positive selection is a reduction in number of lineages: one lineage is fixed, which then starts diversifying again
- Similar to population bottleneck - but it only affects one length of sequence
How can you use codon-based tests to identify selection?
Measure and compare multiple sequences to find fraction of non-synonymous sites that are variable (dN) and also fraction of synonymous sites that are variable (dS)
- Main effect of purifying selection is to reduce genetic diversity
- But selection only expected on non-synonymous sites - where mutation changes the protein
- Synonymous sites should be neutral - codon bias excepted
What is the equation for these codon-based tests and what do the values suggest?
- dN/dS = Ka/Ks = w
- w = 1 - neutral evolution
- w > 1 - positive selection
- w < 1 - purifying selection
What do genes with different codon-based test values suggest?
- Nearly all active genes in humans may show some evidence of purifying selection (w<1) - meaning most mutations to proteins sequences are deleterious
- A few genes have w >1 - often associated with host pathogen-interactions - e.g., MHC class 1/class 2
- Genes mostly have a low w (around 0.1). An increase in the typical w for that gene could be due to positive selection or a weakening of selection - e.g., pseudogenes
Why is variation in dN/dS in MHC genes important?
- Major Histocompatility complex class 1 and 2 genes recognise foreign antigens and stimulate immune response
- Within species loci typically show dN/dS ratio > 1 suggests diversifying selection driven by the need to recognise pathogens
What is the McDonald-Kreitman (MK) test used for and what doe the results suggest?
- Compares dN/dS within and among species to idetify +ve selection in related species lineages
- Test compares ratio of sites fixed by selection versus drift
- Find ratio of non-synonymous to synonymous substitutions between and within species
- Positive selection - ratio of non-synonymous to synonymous variation within species is lower than the ratio between species
- Weakly deleterious mutations can reduce power of MK test - as causes +ve selection to be underestimated
- Neutral: NI ~ 1
- Positive: NI «_space;1
- Balancing: NI»_space; 1
How is variation in dN/dS important in the evolution of FOXP2 in humans?
- FOXP2 - gene with functions strongly linked with cognition and complex and language
- Highly conserved across mammals
- Enard et al., 2002 - MK test - showed excess of non-synonymous mutations in FOXP2 in modern humans - driven by recent +ve selection
- But - more recent study with larger sample sizes suggest this was driven by sample composition - no evidence for +ve selection - Atkinson et al., 2018
- But - high diversity of FOXP2 in echolocating bats - Wang et al 2007
What is the Hudson-Kreitman-Aguade (HKA) test?
- Similar to MK test - but compares rates of evolution between different loci
- Within species diversity (polymorphism) should be correlated to inter-species diversity (divergence) in a neutrally evolving gene - unless selection activity
- Requirement for sequencing from multiple loci and data from outgroup species
- Based on comparisons of ratio of polymorphism with focal species to divergence with outgroup (rpd)
- rpd - contant across neutrally evolving loci
- Choose reference locus that is evolving neutrally
- Lots of different test carried out on genes - refs in lecture
How can genome scale data be used to identify genes under selection and give an example?
- Branch-site models detect positive selection that affects only certain sites on predefined lineages within a phylogenetic tree
- Comparisons among species - identify non-synonymous mutatins unique to specific lineages (species)
e.g., Bowhead whales:
- Comparison of bowhead whale genome to other cetaceans and mammals - identified 14 genes which showed elevated rayes of evolution - many of these associated ewith cancer susceptibility and aging - ERCC1
- Bowheads are potentially the longest lived mammals with lifespans over 200 years