lecture 3 Flashcards
cladograms, phylograms, and chronograms have
branch lengths that contain useful information.
phylogenetic methods
- goal: need to estimate phylogeny.
- approach:
1. collect data (evidence)
2. align data (homology)
3. find the “best” tree (phylogenetic analysis)
types of data
morphology and molecular.
phylogenetic methods
distance-based methods, maximum parsimony, maximum likelihood, and Bayesian inference.
distance based methods steps
- step 1: convert data to a measure of genetic distance between each pair of sequences.
- step 2: calculate a tree using the table of genetic disorders.
distance based methods process
- sequence alignment
- distances between sequences -> distance tables
- calculate unrooted tree
distanced based methods pros
ultra-fast and can handle large number of species.
distance based methods cons
replaces sequences with distances, has problems when distances are tied, and no model of evolution.
maximum parsimony
find the tree that explains the observed data with a minimal number of changes. choosing the tree with the least steps or minimal number of substitutions.
maximum parsimony steps
- step 1: propose a tree and compute parsimony score (least number of changes for a given tree).
- step 2: repeat step 1 until the tree with the minimum number of changes is found.
- changes: informative vs uninformative
maximum parsimony pros
simple to understand.
maximum parsimony cons
easily trapped on local max, no branch lengths, too many equally parsimonious trees, statistically inconsistent (fails to find the correct tree; more data only increases support for incorrect result).
bootstrap
calculating support for relationships. how strong is the signal in the data?
- statistical procedure: random re-sampling of the data, with replacement.
- pseudoreplicates
maximum likelihood
what is the probability of observing a set of data given a hypothesis? the equation of the conditional probability is: likelihood = probability (data | hypothesis) or P(data | tree).
- calculate the likelihood for each hypothesis. look for the hypothesis with the highest likelihood.
- propose new topology, branch lengths, model parameters -> calculate scores; repeat.
substitutions model (max. likelihood)
model the rates of changes among base pairs.
- 1 rate: all changes equal
- 2 rates: transitions vs. transversions
- 6 rates: unique rates - general time - reversible model