Phylogenetic reconstruction Flashcards

1
Q

Cladistics vs Phenetics

A

Cladistics: regards combining characters (Apomorphies)

Phenetics: Regards numeric diferences (Distances)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Criterion and method cladistic methods

A
  • Parsimony:
    1. Maximum parsimony (MP)
  • Probabilistic: Likelihood-based:
  1. Maximum likelihood (ML)
  2. Bayesian inference (BI)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Parsimony

A
  • Less number of steps
  • We expect convergence and reversals
    to occur less than synapomorphies
  • less homoplasy
  • # sp = #trees
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Advantages and disadvantages Parsimony

A

Advantages:

Is a simple method - easily understood operation

  • Does not depend on an explicit model of evolution
  • Gives both trees and associated hypotheses of character evolution
  • Should give reliable results if the data is well structured

Disadvantages:

May give misleading results if homoplasy is common or concentrated in particular parts of the tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Character evolution

History vs model

Observed topology

A

History: Hypothesis about the evolution of a particular character or a phylogeny

Model: Specifies probability of change between the start and end of each branch

Observed topology: Different scenarios

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Assumptions about character evolution

BUT

A
  1. Unordered (Fitch Parsimony)
  2. Ordered (Wagner Parsimony)
  3. Irreversible (Camin-Sokal Parsimony)
  4. Dollo (Dollo Parsimony): The loss of function is more posible

BUT: there are distances that violate inequality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Compound coding probelms

A
  • Create compound conditions
  • each of such condition might legitimately be consider its own character
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multistate coding problems

A

Phylogenetic information can be lost to the tree search process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Non-additive binary coding problem

A

Non-additive binary coding makes the absence token (usually 0) correspond to a ‘nonspecified other’ variable: The ‘0’ taken becomes a catch-all for anything that isn’t scored as ‘1’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Search for a Parsimony tree

A
  1. Exhaustive search (exact)
  2. Branch-and-bound search (exact)
  3. Heuristic search methods (hopefully exact)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Exhaustive search for the Parsimony tree

A
  • Adding 1 more taxon each time
  • All posible trees
  • Absourd t with more than 10 taxa
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Branch and bound search for the Parsimony tree

A

Looking for short cuts (most likely trees)

  • Also t consuming
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Heuristic search for the Parsimony tree

A
  1. Create a starting tree
  2. Branch swapping (Randomize the data ser for every search):
  • Multiple random search replicates
    3. new starting point: change points until: less steps
    4. re-start with different order.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Create a staring tree

A
  1. • A greedy method
  2. • Start with 3-taxon tree (Most parsimoniuos)
  3. • Add taxa one at a time.
  4. • Keep only the best tree found so far
  5. • No guarantee of optimality, but may provide good starting point for search
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Branch swapping:

A
  • Nearest-Neighbor Interchange (NNI)
  • Subtree Pruning and Regrafting (SPR): cutting & pasting different parts of the tree
  • Tree Bisection and Reconnection (TBR)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Criterion phenetic

A

Distance methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Distance methods

Criterion

Advantages & Disadvantages

A

Minimum Evolution (ME)

The tree with the shortest sum of the
branch lengths

Advantages:
• Distances can be ‘corrected’ for unseen events.
• Usually faster than character-based methods.
• Can be used for some rate analyses.

-used at the beginning for checking the alignments

Disadvantages:
• Information lost when characters transformed to distances.
• Cannot be used for character analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Examples for distances (ME)

A
  • total number of differences
  • p (= uncorrected) distances
  • corrected distances following evolutionary models

p- distances = (total # differences)/total # characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Distance methods tree reconstruction

A

Neighbor joining

20
Q

Why Maximum Likelihood

A
  • Multiple substitutions not detectable by parsimony or distance methods
  • Observes the likelihood of every character state
    in a phylogenetic tree
21
Q

Parameters of Maximum Likelihood

A
  1. Substitution probabilities
  2. Base composition
22
Q

Parameter substitution for ML

A

Transitions more frequent than transversion

23
Q

Parameter Base composition in ML

A
  • Amount of character states (ACGT)
  • varies significantly en very organism
24
Q

DNA substitutions models

A
  1. Jukes-Cantor (JC 1969)
  2. Kimura 2 parameters (K2P 1980)
  3. Felstein 1981 (F81)
  4. Hasegawa, Kishino, Yano (HKY 1985)
  5. General time reversible models ( GTR 1990)
  6. Nasty Model
25
Q

Jukes-Cantor

DNA substitutions models

A
  • Most parsimonious
  • Assumes 25% of posibility to each base
  • Simple model (1 parameter)
26
Q

Kimura DNA substitutions model

A
  • @ Transitions
  • ß Transvertions
  • @ different from ß or else: JC
  • 2 parameter
27
Q

Felstein DNA substitutions model

A

Unequal base frequencies Π

Substitutions equally likely

2 paremeters

28
Q

HKY DNA substitutions model

A

Transversions and transitions with different substitution rates

3 parameters

29
Q

GTR DNA substitutions model

A

6 parameter

Takes in to account that all transvertions don´t have the same posibilities

@ is different from ß

All 6 pairs of substitutions have different rates

30
Q

More to improve a DNA substitutions model

A
  • Proportion of invariant sites
  • Gamma distribution
31
Q

proportion of invariant sites (I)

(improve a model)

A

sequences that evolve fast may show less divergence than sequences than slower sequences

  • Not all nucleotides evolve freely

GTR+I

32
Q

Gamma distributions (G)

Improve a model

A

Nucleotides vary differently, some vary more freely than others. Not equally distributed

  • Allow more than 2 categories (zero and non-cero rates)

GTR+I+G

33
Q

How to chose a model

A

Problem:
The more complex a model, the more computationally expensive.

but:
If a model is too generalizing, the inferred phylogeny can be wrong.

Therefore: Model, that is significant better than
others but does not require more parameters than
necessary.

  • Run a model test
34
Q

Model test

A

hRLT for nested models

  • Likelihood of the different models
  • Until there are not significantly differences between the models
35
Q

Likelihood methods

A
  • Maximum likelihood
  • Bayesian inference
  • have an explicit probabilistic model
  • have statistical basis / support
  • search parameters for most likely answer
36
Q

Bayesian inference (BI) Posterior probability

A

A priori assumptions

The probability of the event of interest
under certain conditions.
(conditional probability)

Likelihood and Prior probability

37
Q

Sampling Procedure Markov Chain Monte Carlo

A

1: Robot is programmed to walk a pre-defined amount of steps (also called
generations), e.g., 2,000,0000
2: Robot evaluates every step in varying length and direction:
- if the step is uphill (higher likelihood): always takes step
- if the step is downhill: 1. robot calculates a height ratio between the steps
2. generates a random number between 0 and 1
3. if number lower than ratio: take the step
if number higher than ratio: it stays at same place
3: Robot evaluates following step…
4: Position (tree topology) of e.g. every 100th step is sampled.

38
Q

Bootstrap values vs Posterior probabilities

A

Bootstrap:
Index that best supports the data given, not a true stadistic.

(Split) Posterior Probabilities:

The tree that best supports the data.

39
Q

When to stop the robot MCMC

A
  • multiple runs (time intensive)
  • loooooooong runs (time intensive)
  • multiple Markov Chains (robots) simultaneously
    (Metropolis Coupled Markov Chain Monte Carlo =
    MCMCMC = MC3)
    -one chain as usual (cold chain)
    -other chains can make larger steps (heated chains)
    -chain with the highest probability at every step
    becomes automatically cold chain and is sampled.
40
Q

maximum likelihood vs Bayesian inference

A

ML

Stadistical knowledge

no priors

Unpredictible running time

Branch support can take ages

Heuristic search: get stuckin local optima

bayes

No stadistical knowledge

Priors

T=linear computational complex

Branch support inmediatly

convergence at burn in

41
Q

Testing for Robustness of the phylogenetic tree

A
  • Bootstrap
  • Jacknife
  • Bremer supports (Decay index)
42
Q

Bootstrap

A

1) Characters are resampled with replacement
> many (100…1000…10,000)… bootstrap replicate data sets
2) Tree from each bootstrap replicate reconstructed
3) Majority-rule consensus of all trees
> Visualization of agreement in topologies
4) Majority rule consensus indices
= measure of support for those groups
= bootstrap proportions (BPs),

  • Tells support, but bot quality of the tree
  • Can be wrong if sampled the wrong kind of data
43
Q

Jacknife

A

-Jackknifing is very similar to bootstrapping
• differs only in resampling strategy
• proportion of characters (e.g. 50%) is deleted
• Results summarized with a majority-rule consensus tree
• Majority rule indices = Jackknife Probabilities
• Jackknifing and bootstrapping tend to produce:
– broadly similar results
– similar interpretations

  • cutting-off characters
44
Q

Bremer support (Decay index)

A
  • The number of extra steps it takes to collapse a group
  • Add aditional steps, to see if the topolofy remains
  • The higher the number, higher the support
45
Q

How to measure Posterior probability?

A

(conditional probability * Prior probability) / probability of the data given a specific model

46
Q

How does the robot of MCMC works?

A

Every step is called a generation
Cloud: Sampling a large amount of potential trees
Program to always go up: down only under certain conditions

47
Q

Paup index.

  • Ci
  • Hi
  • Ri
A

Consistency index Ci (Deals with apomorphies)

Ci=( (minimum total SUM of character changes expected)/(actual amount of steps))*100

Also useful to compare trees, to check the amount of homoplasies.

The higher the Ci: the better (how good the data is, and how the characters can be included in the trees)

With Binary characters (0-1): (each character expected to change only one. (parsimony) in the tree)

CI=1 if there is no homoplasy

negatively correlated with the number of species sampled

Homoplasy index Hi: The amount of homoplasies= 1-Ci

0,85-1=-0,15

Retention index

Ri: ((Max steps on the tree - number of state changes in tree)/( Max steps on the tree - number of state changes)

Ri= (Max N. Of steps-Steps observed)/(Max N. Of steps-min. Steps)

defined to be 0 for parsimony uninformative characters

RI=1 if the character fits perfectly

RI=0 if the tree fits the character as poorly as possible