7 - Trees and distance methods Flashcards

1
Q

What are four different methods for inferring phylogeny?

A
  • Distance matrix methods: pairwise distances between all sequences in alignment
  • Parsimony-based methods
  • Maximum likelihood methods
  • Bayesian methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two steps to distance matrix analysis of phylogeny?

A
  1. Calculating the pairwise distances between all species (trimmed multiple alignment) to make a distance matrix
  2. Infer phylogenetic tree from distance matrix by algorithmic method (eg. NJ, UPGMA) or by optimality criterion method (eg. least squares)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you calculate p (Hammings distance)?

A

observed changes / # positions in sequence

or
p = 1 - (proportion of identical sites) = 1 - identity

This is the observed proportion of differences and is used to recover an accurate tree distance (D)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

List the different models for calculating tree distance

A
  • Jukes-Cantor one parameter model (all subs at equal rates)
    D = -3/4ln(1 - (p)4/3)
  • Kimura 2-parameter process (transition not equal to transversion)
    D = 1/2ln(1 / (1 - 2P - Q) + 1/4ln(1/1-2Q)

These can be fitted to a gamma distribution which will quickly show the proportions of sites with slow, medium and fast substitution (evolution) rates.

Shape is governed by the shape parameter, alpha (α),

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens when you ignore among-site rate variation in finding tree distance (D)?

A

UNDERestimation of actual distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give two methods for inferring a tree from the distance matrix

A

Algorithmic methods

  • Unweighted pair group method with arithmetic (UPGMA)
  • Neighbour joining
  • BIONJ and WEIGHBOR

Optimality criterion-based methods

  • Minimum evolution
  • Fitch-Margoliash
  • Least-squares
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the UPGMA method for tree reconstruction

A

Assumes rate of evolution is constant in all organismal lineages so that distance (D) is a linear function of time (T), it assumes a molecular clock.

It assumes the distances are ultrametric, which they typically are not.

Starts off by clustering the first pair of taxa with the smallest distance, then the next smallest distance is found and its branching point is calculated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do tree distance matrices programs treat gap site containing columns?

A

They delete them.

Replacing them with ? or ‘-‘

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

List four problems with UPGMA

A
  • Assumes the data reflects an ultrametric tree
  • Tends to move more divergent sequences deeper into the tree (long branch attraction artefact)

LBA is one of the biggest pitfalls in molecular phylogeny

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is long branch attraction?

A

Long branch attraction (LBA) causes species to seem more closely related in a phylogeny than they really are due to mutations or traits occurring independently (convergent evolution) or FASTER. These shared traits can be misinterpreted as being shared due to common ancestry.

UPGMA is especially bad at this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the neighbour-joining method of tree distances

A

Unlike UPGMA, NJ does not require a molecular clock, only additive distances (ie. that distances between taxa can be represented by a tree structure)

  • This allows rates to vary in different lineages
  • Algorithms that seeks out neighbours (closest pairs of sequences)
  • Starts with a star-tree and sequentially pairs up taxa to minimize the total length implied by the tree (lowest score is best)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly