W2 L1 phylogenetic M Flashcards
What is phylogenetic
- Estimating the evolutionary relationships among species (or genes, or both) from molecular sequences
how to make a phylogenetic tree whent some animal are already extinct
- Data are available only for extant species (genes)
- What happened in the evolutionary past must be inferred (estimated) statistically
Working sequences of extinct species
- fossil give us an estimate date, useful for calibration
- phylogenic analysis and node age reconstruction
- diversification analysis and ancestral state reconstruction
Phylogenetic terminology
Closely related species are called sister species
Rooted and unrooted trees
Rooting gives directionality to a tree
* Without a root you cannot speak of ancestors, descendents, etc.
Stylistic vs. meaningful differences
In phylogeny, the length represents the changes
Different between branch and tree
Resolution of a tree
Polytonal is where two or more species emerges
Hard vs. soft polytomy
Hard polytomy occur when speciation while soft polytomy is when we don’t have enough data
Step in building a phylogeny
- start from sequence alignment, the idea of having a common ancestor
Method involve are
Distance trees - Counting changes: maximum parsimony
- Modeling evolution: maximum likelihood
- Searching tree space
- Bayesian inference
Distance trees
- Conceptual framework:
- Convert character data to distance matrix
- Use distance matrix to construct dendrogram
- Interpretation as evolutionary trees?
- Fast (no evaluation of many trees)
- UPGMA gives rooted, ultrametric trees
(assumes molecular clock) - NJ gives unrooted trees
Counting changes, maximum parsimony
The simplest explanation requiring the fewest assumptions should be preferred over more complicated hypotheses.
Translated towards phylogeny
The MP phylogeny is the one that requires the fewest evolutionary steps.
Modeling evolution: maximum likelihood
Model describes mathematically how molecular sequences evolve
Why likelihood?
* Mathematically explicit: no hidden assumptions
* Results expressed in terms of probability
* Superior performance in phylogenetics
Markov models
Simple stochastic process in which the distribution of future states depends only on the present state
* Simplest case: binary character with states 0 and 1
Why is maximum likelihood favored over other model
- different base frequency
- different substitution rate
-reversion and multiple changes
Basic elements of General Time Reversible (GTR) model
- we assume that the rate of change and the reversal rate are the same as each other
-5 model parameter instead of 12
Base frequencies - 4 base frequencies
- 3 model parameters
[ ] F = p A p C p G p T
€
πT =1−(π A + πC + πG )
Among site variation
- different site have different mutation rate from each other, this can be taken account by the model
Likelihood computation
The likelihood L is the probability of the data given the model.
* The data consist of an alignment of sequences.
* The model is a hypothesis of how the data were generated.
* contain topology, branch lengths and other model parameters (base frequencies, rate matrix, gamma shape, …)
* Different topologies are evaluated one at a time.
* Each time, branch lengths and other parameter are optimized.
* Each site has a likelihood, the total likelihood is the product of all site likelihoods.
* The ML tree is the topology corresponding to the model that yielded the highest overall likelihood.
* Likelihoods are reported on a logarithmic scale for mathematical convenience: ln L