VL 7 Inferring evolutionary processes from DNA sequence data Flashcards

Question 1

Q

What is the neutral theory of molecular evolution?

Answer

A

The neutral theory, proposed by Motoo Kimura, states that
* most evolutionary changes at the molecular level are selectively neutral, meaning they have no effect on fitness.
* Deleterious mutations are purged by natural selection, while beneficial mutations are rare.

The Neutral Theory assumes that
* most genetic variation and evolution at the molecular level are due to neutral mutations that do not affect an organism’s fitness and therefore evolve primarily through genetic drift.
* When using Tajima’s D or other statistical tests, if the results align with the Neutral Theory (e.g., Tajima’s D is close to 0), it suggests that the population is evolving as expected under neutrality, with no strong selection, migration, or population size changes influencing the genetic variation.

outdated, but NULL MODEL

Question 2

Q

How do mutation and genetic drift contribute to evolution?

Answer

A

Mutation introduces new genetic variations, while genetic drift causes random fluctuations in allele frequencies, potentially leading to the fixation of neutral mutations.

𝝁 : neutral mutation rate per site per generation

1Ne𝝁: total number of mutations in a population per site per generation (diploid: 2Ne𝝁)

The probability of fixation of neutral mutation is due to drift inversely related to population size (1/2Ne).

e.g. effecive popsize Ne= 1000
fixation probability: (1/2Ne)
1/2000 = 0,05%

Question 3

Q

What is the molecular clock?

Answer

A

The molecular clock estimates divergence times between species based on the rate of molecular evolution, assuming a constant mutation rate. It helps determine the time of divergence by counting accumulated substitutions.

estimate mutation rate:
- directy: parent-offspring trios
- fossils (lower limit on split time)

assumtions:
* constant mutation rates (species/loci)
* substitutions accumulate by drift only
often violated!

Question 4

Q

What is the nearly neutral model of evolution?

Answer

A

Proposed by Tomoka Ohta, it suggests
* many mutations are slightly deleterious.
* In species with large population sizes, natural selection is more efficient at removing these mutations, resulting in a slower overall rate of molecular evolution.

Tests like Tajima’s D are often used under the assumption of the Neutral Theory as a null hypothesis, and deviations from neutral expectations may be explained by the Nearly Neutral Theory (such as slightly deleterious mutations affecting genetic diversity).

outdated

Question 5

Q

What is coalescent theory?

Answer

A

Coalescent theory models the genealogical relationships of alleles in a population, tracing their lineage back to a most recent common ancestor (MRCA). The expected coalescent time depends on population size (Ne).

probability that two sequences are:
- from the same parent: 1 ÷ 2Ne
- from two different parents: 1 - (1 ÷ 2Ne)
- have a common ancestor t generations ago:
1- (1 ÷ 2Ne)^(t-1) × (1 ÷ 2Ne)

Question 6

Q

What is Tajima’s D?

Answer

A

Tajima’s D is a test that compares
* nucleotide diversity (π)
* to the number of segregating sites (S)

to detect deviations from neutrality.

D=0 Populatio is evolving naturally

D>0 fewer low- frequency plymorhisms than expected -> balncing selection or population contraction

D< 0 excess of low- frequency polymorphisms -> population expansion or purifying selection

Genome wide: demographic reason if it is very similar on the whole genome (like population expansion or contraction, rather then selection)

Question 7

Q

What is the HKA test?

Answer

A

The Hudson-Kreitman-Aguade (HKA) test compares levels of polymorphism within species to divergence between species across multiple loci to test for neutrality.
Higher polymorphism or divergence at a locus may indicate selection.

Question 8

Q

What is the MK test?

Answer

A

The McDonald-Kreitman (MK) test compares the ratio of non-synonymous to synonymous changes within species (polymorphisms) to between species (fixed differences).

A higher ratio of non-synonymous fixed differences suggests positive selection.

Question 9

Q

What if there is no candidate locus for a phenotype?

Answer

A

Genome scanning involves examining the entire genome to identify regions under selection.
It uses sliding-window approaches and haplotype tests to detect selective sweeps and regions of extended haplotype homozygosity (EHH).

More relevant than MK & HKA

Question 10

Q

What is Extended Haplotype Homozygosity (EHH)?

Answer

A

EHH measures the probability and decay of identity by descent (IBD) around a haplotype. It identifies regions where favored alleles are found within large shared haplotypes, indicating recent positive selection.

Definition: Measures the decay of homozygosity for a particular haplotype as you move away from a focal allele.

Interpretation: High EHH suggests strong, recent selection; rapid decay of EHH suggests older or weaker selection.

Question 11

Q

What are iHS and xpEHH?

Answer

A

iHS (integrated haplotype score) evaluates whether the ancestral or derived allele is under selection/ favoured by natural selection and requires an outgroup for indentifying ancestral allele.

xpEHH (cross-population extended haplotype homozygosity) compares haplotype length between populations to detect selection that occurred in one population but not the other

under selection = favoured by natural selection, can lead to increase in frequency

Question 12

Q

What is expected sequence variation in a population?

Answer

A

Expected difference between sequence pairs, randomly drawn from a population: 𝞱 /expected genetic diversity within a population, influenced by mutation rate and population size.

2 × 2Ne 𝞵 per site = 4Ne𝞵 = 𝞱

Ne large: ↑time to MRCA -> more mutations accumulate

Estimating 𝜃: (the infinite site model)

average Number of Nucleotide Differences:
Represented by 𝜋
𝐸 (𝜋)=𝜃
Number of Segregating Sites:
Represented by 𝑆
E (𝑆)=𝑎𝜃
where 𝑎 is a scaling factor dependent on the number of sampled sequences.

a is the sum of the inverse of the sample size minus one for each sequence.
e.g. S= 5 (also an 5 sites unterscheden sich die sequenzen)
a= 1+(1/2)+(1/3)+(1/4)

VL 7 Inferring evolutionary processes from DNA sequence data Flashcards

(12 cards)