What tests for natural selection are there for: - Population genomic data (within species) - Species divergence data (between species). - Population genomic + sequence divergence data

- Site Frequency Spectrum (SFS) - Tajima's D - dn / ds test statistic - HKA test - McDonald-Kreitman test - other differentiation tests.

molecular evolution L4-6 Flashcards by Denise Ravinale

How fast an advantageous allele is fixed in a population depends of two factors, which?

The selection coefficient
The degree of dominance

Larger selection coefficients and higher dominance gives faster fixation.

How well did you know this?

Not at all

Perfectly

Is genetic drift more important in small populations or large populations?

Small populations.

Selection is more effective in large populations and then drift gets less. They are both dependent on the effective population size. In a larger population selection is blind to a smaller extent – selection gets more effective and drift less effective.

How well did you know this?

Not at all

Perfectly

What tests for natural selection are there for:

Population genomic data (within species)
Species divergence data (between species).
Population genomic + sequence divergence data

Site Frequency Spectrum (SFS)
Tajima’s D
dn / ds test statistic
HKA test
McDonald-Kreitman test
other differentiation tests.

How well did you know this?

Not at all

Perfectly

What is selectionism and how does the neutral theory differ from it?

Selectionamism says that natural selection is the driving force of genetic variation. The majority of mutations are deleterious and a small proportion is advantageous and nearly none of them are neutral and fixed by random chance.

The neutral theory says that all genetic variation and the high rate of evolution cannot be explained by natural selection but it can be explained by stochastic drift. Therefore neutral theory states that most mutations are neutral and their frequency in a population is determined by random genetic drift.

How well did you know this?

Not at all

Perfectly

How does the neutral theory explain that advantageous and deleterious mutations contribute very little to molecular variation?

The neutral theory ackonwledges the role of natural selection in adaptation but it says that the deleterious mutations will go to extinction fast and the advantageous will get fixated fast and therefore not contribute to the molecular variation very much.

How well did you know this?

Not at all

Perfectly

What does the nearly nuetral theory say?

It states that nearly neutral mutations that are slightly advantageous/disadvantageous are influenced by both genetic drift and selection and these mutations might contribute to the molecular variation we see.

So the nearly neutral theory divides the mutations into deleterious, advantageous (very small part), neutral and nearly neutral.

How well did you know this?

Not at all

Perfectly

Nearly neutral mutations behave differently depending on population size, explain the differences in how they behanve in smaller vs larger populations.

Slightly disadvantageous mutations behave deleterious/disadvantageous in large populations and neutral in small populations.

Slightly advantageous mutations behave as advantageous in large populations and neutral in small populations.

How well did you know this?

Not at all

Perfectly

Why do we want to detect natural selection?

determine if genes are evolving by natural selection within populations

estimate when natural selection started

determine if different populations are evolving differently in different environments.

identify which genes are evolving the most since species split from each other.

determine functionally important regions of the genome.
ect.

How well did you know this?

Not at all

Perfectly

There are different tests for detecting selection using genetic data, what do they all have in common?

They all depend on the ability to discount the genetic drift - the neutral theory is the null when we test for natural selection.

How well did you know this?

Not at all

Perfectly

What is selective sweeps and genetic hitchiking?

Selective sweeps and genetic hitchiking can be thought of as signs of natural selection occurring.

Genetic hitchhiking is the process of when a neutral or deleterious allele that is sufficiently linked to a positively selected allele increases in frequency or is swept to fixation.

Selective sweeps reduce genetic variation because an advantageous allele is sweeping linked sites with it to fixation. his will result in a reduction in genetic variation because when a positively selected is kept the linked sites with neutral or even deleterious mutations hitchikes to increase in frequency. Because of recombination the linkage equilibrium will decay after some time and the hitchikers will reduce in frequency.

How well did you know this?

Not at all

Perfectly

What tests can we do to detect natural selection within species?

SFS
Tajima’s D test

How well did you know this?

Not at all

Perfectly

What is segregating sites, nucleotide diversity and the concept of theta?

We can view genetic variation in sequencing data as either segregating sites or nucleotide diversity.

Segregating sites refer to positions on the genome sequence of a population where different alleles are present and nucleotide diversity is the average number of pairwise differences.

Theta is the expected level of diversity when mutation and genetic drift are in balance. Meaning that mutations bring variation and drift reduces it at the same rate. So theta describes the variation we expect under neutrality and i defined by 4Neu where Ne is the effective population size and u the mutation rate.

How well did you know this?

Not at all

Perfectly

What is the benefit of using SFS for looking at genetic variation over segregated sites and nucleotide diversity?

Segregated sites and nucleotide diversity tells us little about allele frequency which we can see with SFS.

Since selection acts to change allele frequencies we can detect signals of it in the SFS.

How well did you know this?

Not at all

Perfectly

What is SFS? What does it look like under neutrality, positive/negative and balancing selection?

Site Frequency spectrum refers to the distribution of the frequencies of different genetic variants at a specific loci within a population. It can be used to detect signals of selective pressure.

Under neutrality the singletons (rare mutations) are most common - most mutations are of low frequency in a population and the mutations with high frequency are rare. Here Tajima’s estimator = Watterson’s estimator and Tajima’s D = 0.

Under positive/negative selection the SFS gets skewed towards higher frequencies because positively selected alleles an linked alleles will segregate at higher frequencies because they are positively selected. Here Tajima’s estimator is lower than watterson’s estimator and D is less than 0.

Balancing selection wants to maintain multiple alleles in a population and the SFS will have an excess of alleles segregating at intermediate frequencies. Here Tajima’s estimator is larger than watterson’s estimator and D is larger than 0.

How well did you know this?

Not at all

Perfectly

What is Tajima’s D test?

A test to detect selection pressure within species. The test is designed to detect departures from neutral expectations.

Under the neutral model both Tajima’s estimator (nucleotide diversity) and Watterson’s estimator of genetic variation (segregating sites) should provide the right estimate of theta under neutrality (D=0).

D< 0 = positive/purifying selection
D>0 = balancing selection.

How well did you know this?

Not at all

Perfectly

What is Tajima’s estimator and Wattersons’s estimator of genetic variation?

Tajima’s estimator is average pairwise nucleotide diversity which is a frequency based estimator and rare alleles have little effect on the genetic diversity according to this estimator.

Wattersons’s estimator of genetic variation is based on the number of segregating sites. Rare alleles have big impact on the estimator.

If these two estimates are very different it suggests that the neutral model is not correct.

What are the tests for detecting selection between species?

HKA test
dn / ds
MK test

What kind of data do you need to perform the HKA test?

population genomics data + sequence divergence data.

You need at least two genes each from two different species.

Explain the HKA-test

HKA is a test for detecting natural selection.

The neutral theory predicts both:
- rates of substitution (between species)
- amount of polymorphisms (within species)
And both of these are dependent on the mutation rate so genes with high mutation rate should have both high between-species divergence and high within-species divergence under neutrality.

To test this, we look at the ratios for within species divergence / between species divergence for two genes. If the ratios are not equal it is evidence against the neutral theory.

Explain the dn / ds test

Test for detecting selection between species.

By comparing the rates of synonymous vs nonsynonymous changes between homologs between two species, we can make inferences about natural selection. We define the number of synonymous changes per synonymous site and number of nonsynonymous changes per nonsynonymous site.

The ration between these two is called omega and the value of omega can tell us if there is selection happening. If we have a higher rate of ds then omega will be less than one indicating purifying selection because the mutations that could change the protein sequence are purged.

If we have a higher rate of dn it is indicating positive selection because the mutations that change the protein are kept. Omega is then greater than 1. If omega = 1 then the selection is neutral.

There are different models for performing dn / ds tests what are they?

Site models and branch models each with different complexities we can use.

What is the difference between site models and branch models for dn / ds tests?

branch models aim to detect variation in selection pressures across branches in a phylogenetic tree and the dn / ds ratio is for each branch as opposed to each site as in site models. The branch models give us information about how selection pressure varies across different lineages which we cannot see in a site model.

To see variations in omega between lineages and across sites we need to use branch-site models where we can assign omegas to different sites and different branches(lineages).

What is the MK test?

The McDonald - Kreitman test is a test for detecting selection between species.

Under neutral evolution, the ratio of nonsynonymous to synonymous changes should be the same for between species divergence within species. The MK test aims at detecting divergence from this neutral expectation.

If neutral selection: (dn / ds)substitutions = (dn / ds)SNPs.

In prokaryotes, genome size is a good predictor for the total number of genes
in the genome.

Why is genome size not a good predictor for the total number of genes across all
eukaryotic genomes?

List some other differences between the prokaryotic and eukaryotic genome.

The genome of prokaryotes is very compact without many introns and transposable elements. This means that the number of genes will be pretty close to the total genome size since their genome is very effective.

In eukaryotes the genome contains a lot of introns, transposable elements and parts of the genome that are ont coding genes indicating that the number of genes is much smaller then the total genome size.

prokaryotic genomes are circular and only has one origin of replication. Eukaryotic genomes are linear and multiple origins of replication.

In what way is the “nearly neutral theory of evolution” an improvement of the neutral theory?

The nearly neutral theory states that mutations are either neutral, deleterious or advantageous or nearly neutral meaning that they are slightly disadvantageous or slightly advantageous. This theory is an improvement because it states that the large genetic variation that we see is not only due to random chance but is due to some adaptation to so it is increasing accuracy.

Different methods are used to detect natural selection at different evolutionary time scales. Why is there this distinction and what type of data do the different methods require?

We use different time-scales to look at evolution within species (shorter time-scale) and between species (longer time-scale). To look at molecular evolution within species we need polymorphism data (population genomics data) and to look between species we need species divergence data. There are also tests where we need both population genomics data and species divergence data.

How can demographic events affect the SFS?

Demographic events can cause population bottlenecks (traumatic events) where very few individuals survive and this can skew the SFS without it being selection. It can look like positive/negative selection is the population rapidly increases and it can look like balancing selection if the population is drastically decreased.

Describe the possible fates of a duplicated gene.

One of the copies has no function so the function remains the same - Nonfunctionalization. Double the function of the gene - Conservation. Both copies loose some function and they both together maintain the function - Subfunctionalization. One copy has the original function and the other gets a new function. Neofunctionalization.

What is a monophyletic, paraphyletic and polyphyletic group?

Monophyletic - an ancestor and all its descendants. Paraphyletic - An ancestor and some of its descendants. polyphyletic - two taxes from different clades.

Explain how to perform the MK test.

1. Get the observed differences for both polymorphisms and fixed differences. Sum each together for the total number of polymorphisms and fixed differences. 2. Sum the total number of dn and ds. Sum all of these together for the total number of differences in the dataset. 3. Calculate the frequency of dn sites and ds sites. Total number of dn / total number of differences. 4. Create the 2x2 contingency table by calculating the expected values for dn and ds for both polymorphisms and fixed differences. 5. Calculate (observed-expected)^2 / expected for each dn(P), dn(f), ds(P) and ds(f). 6. Sum all the calculated values together for the X^2 metric and check for significant divergence from the neutral expectation.

Explain how the allele frequency spectrum differs from standard neutral expectations when Tajima's D is negative.

When Tajima’s D is negative it means that Tajima’s estimator (nucleotide diversity) and Watterson’s estimator (segregating sites) are not equal. Under positive selection rare alleles are introduced at higher frequencies and this will have a large impact on the Watterson’s estimator but not on Tajima’s estimator. So Tajima’s estimator < Watterson’s estimator which will give a negative D that indeed is an indication of positive selection. If Tajima’s D > 0 then Tajima’s estimator > Watterson’s estimator and that would indicate balancing selection.