Evolution Exam 3 Flashcards
Calculating mutation rate for neutral mutations
If u refers to rate of mutation per gene per generation, for a diploid population size Ne, the number of new mutations per generation is 2Neu
Probability of fixation
This probability is equal to its frequency. This means for a newly arisen allele this probability is 1/(2Ne)
Number of mutations that arise and eventually get fixed is (2Neu)(1/[2Ne]) = u. Under the neutral theory the rate of fixation equals the mutation rate for neutral mutations
Implications of the neutral theory of mutation
Mutation rate does not depend on population size. A larger population size indicates more mutants but also a lower likelihood that any mutation is fixed which balances it out
Often expect to find polymorphisms floating neutrally within species since the time to fixation by drift is 4Ne generations
Problem with neutral theory
Neutral theory looks at the rate of mutation for a gamete per generation not per absolute time but the molecular clock seems to follow absolute time even though species vary widely in generation time. This begs the question of why aren’t more mutations accumulating in lineages with short generation time?
How does the nearly neutral model resolve why more mutations do not accumulate in lineages with a shorter generation time?
If most fixed mutations are instead slightly deleterious instead of strictly neutral, the probability of drifting to fixation will depend on population size
In a small population, drift overrides weak selection so most mutations are evolving as if they are neutral, so they are effectively neutral. Effectively neutral mutations have a selection coefficient of 0. Mathematically mutations are effectively neutral if 2Ne is less than or equal to 1/s.
In a large population, drift is weaker than selection so most mutations are not neutral and are selected against
Species with short generation times tend to have a larger population size. This results in many mutations per year but fewer mutations that are effectively neutral
Species with long generations tend to have smaller populations, these populations have fewer mutations per year but a higher proportion of them can act as if they are effectively neutral
dN/dS ratios
dN refers to the rate of nonsynonymous substitutions per site and dS refers to the rate of synonymous substitutions per site. Both are measured as a proportion of sites that are polymorphic
dN/dS < 1 indicates natural selection is likely eliminating nonsynonymous substitutions, strong purifying selection
dN/dS = 1 indicates replacements are neutral and there is little functional constraint, pseudogenes or weaker purifying selection
dN/dS > 1 indicates replacements are advantageous and favored by selection, positive selection
MHC
plays an important role in antigen protection, they have ARS receptors or antigen recognition sites which is part of this process
At the antigen recognition site dN/dS: 3.8 showing positive selection where amino acid changes at the antigen recognition site are favored since there is an arms race with co-evolving pathogens
At other protein domains dN/dS: 0.64 which indicates purifying selection, typical
How can dN/dS be used to detect pseudogene evolution?
Regions with relatively higher dN/dS ratios are more likely to be pseudogenes and can be identified from loss of function in phenotypes
Pseudogene
segment of DNA that structurally resembles a gene but does not encode for a functional protein
McDonald-Kreitman (MK) test
Posits that the number of polymorphic sites within a species should be directly proportional to the number of differences that become fixed between that species and a sister species according to the neutral theory
Within species the proportion of nonsynonymous and synonymous mutations can be used to identify this, between species the ratio of fixed differences for synonymous and nonsynonymous between species should be examined to indicate whether or not selection is driving amino acid change
Determined Adh gene had a high ratio of non-synonymous fixed differences between D. melanogaster and D. simulans, this is interpreted to indicate that selection favors amino acid changes that differ between those two species at that gene
Helpful for detecting past positive selection that is no longer detectable using dN and dS ratios within a species
Tests of selection on DNA sequences
Using neutral or nearly neutral evolution as a null hypothesis, we can assume most genes across a genome are evolving in accordance with the nearly neutral model
Look for specific loci that deviate from the neutral or nearly neutral null hypothesis, these are genes where natural selection is favoring or disfavoring specific alleles
Researchers overall look for patterns of evolution that do not fit evolution by genetic drift and also background purifying selection alone
Three methods of testing selection
dN/dS ratios, McDonald-Kreitman (MK) test, Haplotype tree shape
How can haplotype tree shape be used to test for selection?
They can test whether there is an excess of old or new polymorphisms compared to neutral expectations. This is because under neutral evolution, a mixture of closely related haplotypes are more likely to be recently diverged from an ancestral haplotype and those that are more distantly related are more diverged
How can positive selection and balancing selection be identified when haplotype trees cannot be constructed?
As a proxy, researchers can look at the frequencies of polymorphisms. Tip branches indicate low-frequency or rare polymorphisms (higher sigma) and internal branches indicate higher frequency polymorphisms (higher pi)
What does the S locus allele tell us about Negative FDS?
S alleles have been maintained by negative FDS for so long that the same allele may be present in different species and even completely different genera (tomatoes, potatoes, chiles, etc). Haplotype sharing can be caused by incomplete lineage sorting and gene flow, but in this case it is caused by balancing selection and show no correspondence to genus-level phylogeny
Ways to statistically compare the frequencies of rare vs high-frequency polymorphisms
Pi = average number of sites that differ between two randomly chosen sequences, calculated as =2pipjij. This value is dependent on the frequencies of polymorphisms as well as the nucleotide difference
This is compared to sigma which refers to the total number of polymorphic sites in a gene, which is not sensitive to the frequencies of polymorphisms
Under neutrality pi = sigma , under positive selection pi < sigma since an excess of rare polymorphisms is present which indicates a shallower tree, and under balancing selection pi > sigma since there is an excess of high frequency polymorphisms
Tajima’s D
refers to the statistic that compares the values of and w and can be calculated as pi - sigma, so D = 0 indicates neutral evolution, D < 0 indicates positive selection, and D > 0 indicates balancing selection
Problems with Tajima’s D
Sensitive to population size fluctuations (demographic effects) and population structure. Recent population expansion creates D < 0 since new mutations can arise on different copies in a haplotype. In this case, a neutrally evolving gene can present as being positively selected for since excess low-frequency polymorphisms are present at a neutrally evolving gene
It is also sensitive to population size fluctuations and population structure since long internal branches between isolated and diverging populations create a D > 0 for a neutrally evolving gene since genetic drift can lead to loci becoming differentiated as populations become isolated.
To identify selection, Tajima’s D should be calculated for multiple loci since demographic effects affect all genes but selection is gene specific
Detecting positive selection in the genome
For a favored mutation, positive selection will lead to a selective sweep which is a genomic region with low nucleotide diversity and a large block of linkage disequilibrium, when a large part of a genome is universally selected for. One example is rice domestication where humans selected for mutations in the Waxy gene that controlled starch synthesis
AZT
Drug treatment that was used early in the HIV/AIDS epidemic that blocks reverse transcriptase so it mimics T but stops polymerization. However, there was a rapid evolution of resistance since it developed over time and viral growth in turn steadily decreased over 20 months
Resistance develops rapidly since amino acid changes that confer resistance occur in the binding domain of reverse transcriptase and this altered conformation prevents reverse transcriptase from binding AZT effectively
HIV Life Cycle Steps
HIV first has a virion or extracellular stage
The gp120 protein on the surface binds to CD4 along with the coreceptor on the host cell, one of these coreceptors is the CCR5 protein
HIV’s RNA genome, reverse transcriptase, integrase, and protease then enter the host cell
Reverse transcriptase synthesizes HIV DNA from the HIV RNA template
Integrase splices HIV DNA into host genomes and the HIV DNA is transcribed into HIV mRNA by the RNA polymerase of the host cell
HIV mRNA is translated by HIV precursor proteins is then translated by ribosomes of host cell and the protease cleaves the precursors into mature viral proteins
A new generation of virions assembles in the host cell
New virions then bud from the host cell membrane
Evolution of HIV
Refers to the immune system driven rapid evolution within patients where long branches indicate positive selection
Each shaded color corresponds to a different patient along with the non-neutral evolution within patients
Transmission between hosts also correlates with founder events and genetic drift
Why are drug cocktails more effective than single drug treatments?
Drug cocktails target multiple mechanisms and there are multiple classes (fusion inhibitors that block virus surface proteins, reverse transcriptase inhibitors, protease inhibitors, integrase inhibitors)
They are more effective since HIV resistance would require simultaneous mutations that confer resistance against multiple drugs which is a lot more unlikely
But even multi-drug treatments can lose their effectiveness after around 3 years because of side effects where people go off drugs and a dorman virion reservoir within the body
Evolutionary origins of HIV
Major strains are HIV1 (epidemic form that is transferred from chimps via bush meat where wild animals are hunted) and HIV2 which is primarily in West Africa and less virulent, transferred from sooty mangabey
Gp120 provides better phylogenetic resolution for a chimp and human clade
Zoonotic disease
Transmissible to humans from other animals
Molecular clock estimate
To estimate this, first an unrooted tree shows the genetic distances among HIV-1 M strains collected from the 1980s-90s. A plot of divergence from an inferred common ancestor was then determined and a 95% confidence interval showed a common ancestor from 1915-1941 was likely present
Natural resistance to HIV
32 bp deletion in the CCR5 coreceptor mutation proved to be resistant to HIV and therefore underwent positive selection. This was also possibly selected in favor for resistance against bubonic plague or smallpox, but it is highly present in European populations and mostly absent in Asia and Africa. It does precede HIV epidemic and CCR5-32 del is also linked to COVID resistance but homozygotes had an increased susceptibility to West Nile
Timothy Brown
Patient with leukemia and also infected with HIV, he had 2 bone marrow transplants in 2008 and 2009 where the donor was homozygous for CCR5-32 deletion and his HIVI infection was cleared after transplants. Another patient was also cured by the same method
HIV-controllers
when patients become infected but are essentially asymptomatic, HIV controllers have an MHC protein configuration that allows only low-fitness HIV to escape MHC detection and this results in HIV viral DNA being present in non-transcribed (heterochromatic) regions
Influenza A
contains 8 RNA strands and encodes for 11 proteins, many of these strands can be exchanged with one another which contributes to evolution. Two major coat proteins are neuraminidase and hemagglutinin (major protein recognized by the immune system)
Antigenic sites
protein regions that are recognized by the human immune system
Strain numbers
broad groupings of protein regions based on human antibody recognition, example is H1N1