Molecular phylogenies 1 Flashcards

1
Q

The history of taxonomy and phylogeny

A

1735: Heirachical tables based on morphological characteristics by Linnaeus

1868: Ladder of nature by Ernst Haeckel

1859: Phylogeny in the origin of species

Phylogeny of mathematics

1900s: The availability of DNA sequence data led to the modern era of phylogenetics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Homology

A

Characteristics/ loci are homologous id they are similar and have descended from a common ancestor

This is why DNA must be aligned so that homologous sequences can be compared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Molecular phylogenetics

A

Molecular phylogenetics compares DNA sequences to resolve the phylogeny of a species.

This information is scrambled, fragmented, hidden or lost so maths and statistical methods are used to recover information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Types of molecular phylogeny sequence comparisons

A

Orthologous: Sequences from different species to study speciation and extinction

Homolous: Sequences from the same species to look at population genetics

Paralogous: Sequences from the same genome to look deletions and duplications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Benefits of molecular characters over morphological characters

A

Advantage
- Very common (every locus in the genome can be its own characteristic)
- Objective, easy to quantify
- Available when morphology is uninformative (e.g. micro-organisms which look similar)
- Cheap and fast
- Can be obtained without specialist training

Disadvatage
- Unavailable for extinct species
- Ancient DNA is the exception as DNA can be extracted from remains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Example of a discovery following evolution of molecular phylogony

A

3 domains not 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of mutations

A

Transition or transversion mutations
- transition are more common as they occur between bases of Similair shape (one or two ring structure)
- transfertion are less likely to conserve biochemical properties of original amino acid.
HIV study found transversions had much greater negative relative fitness effect -> Lyon’s et Al

silent/ synonymous mutations
non-synonymous mutations
insertions
deletions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Overview or the process for analysing sequences

A

Molecular sequence

alignment

genetic distances

Evolutionary tree of genetic distance

Evolutionary tree of time

Analyses:
- population-level processes
- Species-level processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sequence alingnment

A
  • Sequences must be aligned to allow for positional homology and sequences to be compared.
  • During alignment, positional homologies are proposed for each site, inserting gaps where needed.
  • In analyses you have to set penalties for gaps and extensions to determine the precision of the alignment (do not want over fitting so gap penalty should be higher than alignment penalty)
  • penalties set for different sequence differences (e.g. more for transversion than transition)
  • The best alignment is chosen (the alignment with the lowest total cost)
  • clustal is common tool
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Genetic distances

A

Once they are alignment, the genetic differences (distance) between the sequences must be measured.

Hamming distance: p = number of different nucleotides sequence length

BUT cannot count the mismatch sites due to convergent evolution (Multiple hits problem) but we can assume that:
- Low divergence: observed number close to actual
- High divergence: observed number smaller than actual

Nucleotide substitution models use this assumption to workout the actual number of mismatched. -> distances between sequences

Simplest model: Jukes- Cantor model
- assumes each type of mutation occurs at constant rate
- Each nucleotide equally likely to transition into any of them

Transversion and transition different: HKY

Transversion and transition different: Kimura 2-parameter model
- Two rate parametres -> ALpha and Beta
- Calculate P and Q -> fraction of transition and fraction of transversion

Al rates are different: 12-parameter model

CAn add 13th parameter that takes into consideration change in mutation rate at GC rich regions -> but due to more assumptions these model perform worse than more simple models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Juke-Canor model

A

Simplest model which used algebra to calculate genetic distance from number of mismatches

The model makes many assumptions.
- Evoltuion at each site occurs at the same rate -> incorperate gamma model increases accuracy greately
- Nucleotide base frequencies are the same for all sequences
- Evolution at each site is independent
- The different types of mutations occur at the same rate.

Models can be made more sophisticated, and statistical models can be incorporated.

Different models have very different estimated of genetic distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Common phylogenetic methods

A

Algorithmic methods: Cluster algorithms are used to transform genetic distances into a tree
- Neighbour-joining trees
- UPGMA

Optimality methods: a score is defined to the tree and the highest score is selected.
- Maximum parsimony
- Maximum liklihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

UPGMA

A

A matrix of genetic distances is made and the two closest taxa are clustered to create one node

The matrix distance is recalculated and the next closest taxa is clustered.

This process is repeated

Limitations
- Assumes constant rate of substitution -> molecular clock hypothesis

Neighbour joining tree able to accomodate differences in rates -> branch length proportional to the amount of change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Maximum parsimony

A

The tree which requires the fewest evolutionary changes to explain the observed sequences is the best tree.

This is determined by the parsimony score which is calculated for each character and summed.

not Suitable for fast-evolving or highly divergent populations with many evolutionary changes -> small differences unlikely to be significant

Parsimony score -> minimum number of evolutionary changes required to explain the observed characters.
- Score calculated seperately for each character and then summed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Maximum liklihood

A

The tree which is probabilistically most likely to have given rise to the observed sequences is the best tree.

CAlculate P(seqs|T,B,Q) :
- tree topology
- Branch legnth
- Rate parameters of substitution model (Q)

Slower and Bias for small samples and computationally extensive.
Requires substitution model which can introduce bias.

A proability is calculated for each tree and then the tree with the highest probability is chosen:
- Exhaustive search (Not possible when there are many tree options)
- Hill climbing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bayesian inference

A
  • Each tree has a probability given the data. We should consider the whole probability distribution, not just focus on the single most probable tree.

Similair to maximum liklihood but the whole probaility disitrbution of trees is considered and not just most probable tree.