Molecular evolution L1-4 Flashcards

1
Q

What is molecular evolution? In what two disciplines does molecular evolution have its roots?

A

The evolution and change of macromolecules with time. Goal is to establish relationships between sequences - how do they evolve. Goal is to reconstruct evolutionary history.

Popular genetics which provides the
theoretical foundation for the study of
evolutionary processes.

Molecular biology which provides the empirical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are we interested in molecular evolution?

A

Understanding the origin and diversity of life

Understanding the relationship between the phenotype and the genotype -
What makes a modern human?

Biomedical implications - understanding diseases and ex. antibiotic resistance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Hardy Weinberg principle and why is it so important?

A

Hardy Weinberg equilibrium means that allele frequencies are not changing and will remain constant between generations in the absence of other evolutionary influences.

The HWE is the null hypothesis for no evolution which means that we can use it to test for deviations from it which could mean:
- Nonrandon/assortative mating
- Inbreeding
- Population structure
- Selection
- Genetic drift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the seven underlying assumptions of HWE?

A

Diploid species
Sexual reproduction
Non-overlapping generations
Random mating
Infinite population size
Allele frequencies are equal in the sexes
No mutation, migration or selection, drift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain how we can test if sequences are in HWE.

A

Let’s say we have 2 alleles A and a.

  1. Use genotypic frequencies (AA, Aa, aa) to get allele frequencies allele p and q.
    p + q = 1
    AA + Aa + aa = 1
  2. Use allele frequencies to get expected genotypic frequencies after one generation of random mating:
    p^2 = AA (expected homozygotes)
    2pq = Aa (expected heterozygotes)
    q^2 = aa (expected homozygotes)
    p + q = 1
  3. Use chi^2 test to see if expected an observed genotypic proportions differ.

p-value < 0.05 indicates that we can reject the null of HWE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is it so difficult to remove rare alleles from a population through selection?

A

If the allele is recessive then it can hide behind the heterozygotes and the phenotype won’t show and removing them will then get difficult. If it is dominant then it will get purged out fast. The question assumes that the mutation is deleterious.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can dominance/recessivesness influence expected response to selection?

A

Dominant beneficial – It will increase in frequency but it will take very long time for it to actually be fixed because it can stay recessive and still have high fitness. Evolution won’t purge out those “bad” recessive alleles that hide in the heterozygotes.

Deleterious recessive – Hard for evolution to select them out.

Recessive beneficial – will increase in frequency more slowly than the dominant beneficial but will reach fixation faster because being homozygote increase fitness.

Over dominance (heterozygote advantage) – Will increase and stabilize at intermediate frequency and will be stable there over time.

Heterozygote disadvantage – Could get lost or get fixated depending on the starting frequency. More often it is going to get lost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is recombination?

A

Exchange of genetic information in mitosis and meiosis, creating new sequences – increases genetic variation. If recombination happens in a loci with linkage equilibrium the equilibrium is broken.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is linkage equilibrium?

A

Traits tend to be inherited together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we measure genetic variation in sequence data? Give at least two example metrics used to measure genetic variation.

A

Nucleotide diversity
Segregating sites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the forces of genetic variation and which is the main one?

A

Mutations is the main force of variation.

Recombination can be mutagenic.

Migration.

Genetic drift (the process of change in allele frequency due solely to chance effects) reduces variation over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What determines the fate of a mutation in a finite population? What are these two factors dependent on?

A

Genetic drift and selection. They are both dependent on the effective population size.

Selection is more effective in large populations and then drift gets less. In a larger population selection is blind to a smaller extent – selection gets more effective and drift less effective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is effective population size vs census population size?

A

A measure of how many individuals affect what will happen in the next generation. Usually smaller than the actual census size. It is not a measure of genetic diversion but there is some correlation to it.

Census population size is the total number of individuals in a population and generally the effective population size is not as large as the census population size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a common model to describe the fate of a mutation in a finite population? What are the underlying assumptions of this model?

A

The Wright-Fisher model. You model for what will happen in the future and it is mainly a model of what will happen due to genetic drift because it assumes that there is no selection.

You also assume: random mating, finite population size, it models haploid populations and it assumes no mutations/recombination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Discuss similarities and differences between the Hardy-Weinberg model and the Wright-Fisher model!

A

Hardy Weinberg assumes infinite population size and it does not allow for drift. HW assumes diploid population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the role of population size in the evolution of neutral mutations?

How do we calculate how many neutral mutations per generation we will accumulate between two sequences and the probability of fixation?

A

Population size does not matter for neutral mutations.

2N*mu = How many neutral mutations per generation we will accumulate between two sequences. Number of neutral mutations are equal to mutation rate because population size does not matter here.

The probability of fixation is 1 / 2N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Discuss the difference between a mutation and a substitution!

A

A mutation is a new allele, (gene)substitution happens when a mutation gets fixed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the prediction of the molecular clock hypothesis? List and explain some factors that may cause deviation from the molecular clock.

A

The hypothesis is that the substitution rate is constant within species. So if we know the divergence rate and have the number of differences we can date when the divergence happens.

The substitution rate can be different because of different mutation rates which can be caused by selection pressure (mating), different repair mechanisms for mutations, different generation times, differences in metabolic rate (usually linked to generation time).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are models of sequence evolution? Why are they needed?

A

They are models that allow us to reconstruct the evolution of sequences. We tend to underestimate the real evolution because some changes are hidden. Models like jc69 ect. Help model for what we cannot see just by looking at the two sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is functional constraint?

A

Regions of importance will have functional constraints against evolutionary change. They can have a function they need to preserve and selection will be very high to purge all changes in that region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the relative rate test? What is it used for?

A

Test to use to check if the molecular clock hypothesis is correct for the data we are looking at. If we have equal branch length between taxa in a clade and the outgroup we assume the molecular clock to hold.

22
Q

What is genetic drift? Does it increase or decrease genetic variation?

A

Genetic drift is the change of allele frequencies due entirely to randomness and it is one of the two major forces that govern the change in allele frequencies together with natural selection.

It decreases the genetic variation and it can contribute to speciation.

23
Q

What does the rate of approach of fixation/loss depend on?

A

The rate is determined by the effective population size.

24
Q

When is the result of genetic drift more drastic?

A

It acts faster and has more drastic results in smaller populations and is strong in rare endangered populations.

25
Q

What is the Wright-Fisher model? What is its assumptions?

A

It is a model for describe genetic drift in a finite population in discrete time (discrete generations). It is mostly a model of genetic drift since it assumes no selection, no mutation and no migration.

It also assumes random mating and haploid populations without sexes.

The Wright-Fisher model simulates a population in which individuals randomly mate to produce the next generation, and it tracks the changes in allele frequencies over successive generations to describe genetic drift.

We use random sampling with replacement to simulate the random mating and then we check if the allele frequencies are equal after a few generations, the fact that they are not is support for genetic drift.

26
Q

What is natural selection?

A

The process in nature by which individuals with genotypes best adapted to an environment have greater fitness (survivorship and reproduction) than less adapted individuals.

Natural selection can be:
- negative/purifying where deleterious mutations are purged.
- Neutral, chance events.
- Positive selection where mutations are kept and increase in frequency.

27
Q

What is directional selection?

A

Selection that favors ONE extreme phenotype, causing the average phenotype in the population to change in one direction.

Reduces variation.

28
Q

What is stabilizing selection?

A

Favors phenotypes near the middle of the range of phenotypic variation maintaining average phenotype.

Reduces variation.

29
Q

What is disruptive selection?

A

Favors extreme phenotypes at both ends of the range of phenotypic variation.

Increases variation.

30
Q

What is balancing selection?

A

No single phenotype is favored in all populations of a species at all times.

Maintains variation.

31
Q

What is the selection coefficient?

A

The selection coefficient is a value between 0 and 1 that tells us if an allele is going to increase or decrease in frequency compared to another allele.

If it has a negative value then it will decrease and the homozygote of the other allele has higher fitness.

If it has a positive value then it will increase and the homozygote of this allele has higher fitness than the homozygote of the original homozygote.

32
Q

What is the fate of a new allele dependent on? Explain how the fitness is described for the new genotypes depending on dominance (dominant, recessive, codominant, over dominant and underdominant).

A

When a new mutation arises its fate in a population is defined by the selection coefficient and the degree of dominance. Let the original homozygote have fitness 1.

If the new allele is recessive then the heterozygote form will have the same fitness as the original homozygote - 1. The new homozygote will have fitness 1 + s. Will go to fixation faster because homozygote form is needed for high fitness.

If the new allele is dominant then fitness of the heterozygote will be 1 + s as well as the new homozygote. It will take very long to get fixated (if ever) because heterozygote is as good as homozygote.

If there is codominance then original homozygote has fitness 1, heterozygote 1 + s and new homozygote 1 + 2s. The two homozygotes have different fitness values and the heterozygote is the average of the two.

If there is overdominance then the heterozygote has higher fitness than both homozygotes and fitness is 1 for original homozygote, 1+s1 for heterozygote and 1 + s2 for new homozygote where s1 > s2.

If there is underdominance then the heterozygote has lower fitness than both homozygotes. If it goes to fixation or elimination depends on the initial frequency (initial frequency of 0.333 in infinite population maintains the frequency, anything over or under leads to fixation or elimination).

33
Q

What is the probability of fixation
of a new mutant dependent on?

A

The selection coefficient, the effective population size and the initial frequency.

Irrespective of whether you have a fitness advantage of disadvantage, patterns of allele frequency change are not entirely deterministic (i.e. genetic drift is always present in populations of finite size leading to randomness/stochasticity)

A beneficial allele at intermediate frequency of 0.5 will get fixed >50% of the time and lost <50% of the time
* Conversely, a deleterious allele at intermediate frequency of 0.5 will get lost >50% of the time and fixed <50% of the time.

How much less and more than 50% depends on the selection coefficient (how good or bad the allele is) but also the population size —- at very small population sizes, genetic drift is the major evolutionary force driving allelic change (selection is inefficient)

34
Q

What is the probability of fixation of a new neutral allele?

A

The probability is equal to its frequency since s = 0 and the equation for fixation probability is reduced to only the frequency in the population.

frequency = 1 / 2N so the probability of fixation = 1 /2N.

35
Q

What is the probability of fixation for a new beneficial allele?

A

P(fixation) = 2s. If result is 2% it means that 98% of all mutants with selective advantage still gets lost.

36
Q

What is the probability of fixation for a new allele with very small selection coefficient?

A

P(fixation) = 2s / 1 - e^-4Ns. Slightly deleterious mutants still have a small chance of getting fixed.

Smaller populations will have higher risk of fixating mutants with very low selection coefficients to the genetic quality gets lower in small populations.

37
Q

What is fixation time and what does it depend on?

A

Fixation time is the number of generations it takes for a mutant allele to become fixed in a population.

For a neutral allele it is dependent on initial frequency and population size.

For a beneficial allele it is also dependent on selection coefficient.

38
Q

Why does the rate of substitution for neutral mutations not depend on the population size?

A

Rate of substitutions is defined as the number of substitutions or fixation per time unit. To calculate we need to know how many mutations we accumulate over time and the probability of fixation.

2N*mu = How many neutral mutations per generation we will accumulate between two sequences. But probability of fixation = 1 /2N. Therefore rate of substitution is given as (2N *mu)(1/2N) = mu since the population size is canceled out.

39
Q

What was Peter Buri’s 1956 study of experimental evolution? Explain his conclusions

A

He looked at molecular evolution in finite population sizes

He kept 107 populations in bottles for 20 generations with allelic frequencies 0.5 initially and then observed how the frequencies changed over the generations. He saw that the frequencies were random after 20 generations in all populations. This is the cause of genetic drift.

The main conclusions were that genetic drift decreases genetic variation within populations (the genotypes of the flies in the populations were similar) and most individual populations go to loss or fixation within 20 generations. Genetic drift increases variation between populations (he saw differences between the 107 populations) and about half of the populations are fixed after 20 generations.

40
Q

What is segregating sites vs nucleotide diversity?

A

They are two different ways to measure genetic variation.

Segregating sites is the number of places on the gene where there is a polymorphism between sequences.

Nucleotide diversity is the average number of nucleotide differences per site between two sequences.

Assume that we have 4 sequences. The total number of comparisons we have to do is 4(4-1)/2 = 6. The sequences are 10 sites long.

Nucleotide diversity = (number of differences between each unique pair of sequences / number of comparisons) / number of sites.

41
Q

Why are we interested in the rate of nucleotide substitution?

A

In enables us to do molecular dating.

In order to characterize the evolution of a DNA sequence we need to know how fast it evolves.

42
Q

What are orthologs vs paralogs?

A

orthologs where a gene sequence and function is the same in two different species.

Paralogs is duplicated genes in species even if they over time become different in sequence and function.

43
Q

What are substitution models? Why do we need them? How can they differ in complexity?

A

Substitution models are probabilistic models that model for sequence divergence. We need them because due to hidden events we tend to underestimate how much sequences has changed over time. We especially need the corrections when we are comparing sequences from distantly related species.

The models give us a likelihood value for seeing this specific data under this model and we can fit different models of different complexities to find the one that best fits our data.

Their complexity is decided by how many parameters they have.

jc69 is the simplest one with only 1 parameter for the rate of substitution between each nucleotide pair.

k80 is a bit more complex where we have different rates for transitions and transversions.

gtr accounts for different rates for each nucleotide pair (6 rates) and also accounts for unequal nucleotide frequencies.

We use likelihood ratio test to see if the differences in likelihood are significant.

44
Q

What is the molecular clock hypothesis? What problem does it introduce?

A

The hypothesis is that if the substitution rate is constant we should be able to see a molecular clock and we should be able to do molecular dating.

The problem is that the substitution rate is different in different proteins even though it is constant. This means that clock is a bit “sloppy” because it is not exact. However, it is still useful since we can use it to describe how likely the value is to fall between given any range of path lengths - we can define a confidence set of 95%.

45
Q

What is the relative rate test?

A

We need to justify the existence of a molecular clock before we use it to make inferences.

Test to use to check if the molecular clock hypothesis is correct for the data we are looking at. If we have equal branch length between taxa in a clade and the outgroup we assume the molecular clock to hold.

For example if we have a tree with taxa A, B and C where A and B have a more recent common ancestor:

d = distanceA-C - distance B-C = 0 if the hypothesis of a molecular clock holds.

46
Q

What are the different explanations for variations in substitution rates among evolutionary lineages?

A
  • variation in the mutation rate
  • variation in selection and genetic drift.
47
Q

What can variations in the mutation rate between species be caused by?

A
  • Differences in the damage repair
  • Relative impact of mutagens
  • Generation times - species with shorter generation times copy their DNA more often and this leads to more mutations.
  • Differences in number of DNA replications to produce gametes. Higher number of replications gives higher mutation rate.
  • Timing of DNA replication. Later replicating regions accumulate more mutations.
  • Metabolic rate. Higher metabolic rate generate more intracellular mutagens.
48
Q

What can variations in selection affecting the substitution rate be the cause of?

A

Constraints on functional regions gives purifying selection.

Protein coding regions generally evolve slower and have longer substitution rate because of the constraints and purifying selection - the tolerance for mutations in those regions is low.

49
Q

What is the intensity of purifying selection determined by?

A

How intolerant a site or genomic region is towards mutation.

The functional constraint defines the range of nucleotides that are acceptable without affecting the function of the protein.

I.e. the stronger the functional constraint, the fewer nucleotide substitutions are accepted and the slower the evolution of the region becomes.

50
Q

Does sequence conservation always imply functional constraint?

A

No not always. Conservation can also be the cause of:
- random chance
- low mutation rate

There are also genes whose function lies in their diversity, like the genes coding for immune functions.

51
Q

What is codon usage bias?

A

We expect that all codons that code for the same amino acid will be used in the same frequency. However this is not what we observe.

There seem to be preferred codons and this is called codon usage bias.

52
Q

Give an example of what could be the cause of codon usage bias?

A

Codon usage bias seem to be driven by:
- selection for translational accuracy
- selection for translational efficiency

tRNAs are specific for certain codons and some tRNAs are more occurring than others.

It is less delay in translation if the most occurring tRNA is used for the codon which means less likelihood of mutations.