Lecture 12 - Phylogenetic Models Flashcards
is there a constant rate of mutation seen in all branches
no
are multiple changes seen at individual sites
yes
what are possible factors in models of molecular evolution
- different substitution preferences
- different rates at different sequence positions
- different rates on different branches of the tree
what are some measures of evolutionary distance
- fractional alignment/p-distance
- poisson distance
what is the calculation of fraction alignment/p-distance
p = D/L
- D is the number of observed changes
- L is the length of the sequence
what does Poisson distance account for
multiple substitutions at individual sites
what is the probability of one of two aligned positions changing [Poisson distance]
p = 1-e^{-2rt}
what is the calculation for the Poisson distance (d_p)
d_p = -ln(1-p)
what is the goal of nucleotide models
to effectively represent nucleotide changes within a set of sequences
what are the assumptions of the Jukes-Cantor Model
- all sites are independent
- rates of evolution are the same at all sites
- all substitutions are equally likely, and occur at rate α
what is the chance of a nucleotide not changing in the Jukes-Cantor model
1 - 3α
what does the Kimura Two-Parameter (K2P) model account for
different rates for transitions (α) and transversions (β)
which occur at a lower rate: transitions or transversions
transversions
what does the KHY85 model account for
corrects for the ratio of nucleotide composition
what does the Generalized Time-Reversible (GTR) model account for
nucleotide composition and different rates for all possible reversible transitions and transversions
what can be used to correct for different rates at different positions
the Gamma distribution
how are protein models commonly derived
using empirically derived substitution matrices
how does parameter number affect choosing a nucleotide model
too few -> inaccuracy, convergence upon the wrong tree
too many -> reduces statistical power, the ability to reject a hypothesis
what is overfitting
forcing too many parameters on data that has natural statistical variation
what are Modeltest and Prottest
algorithms that assess models
what is the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) used for
measure of quality used to assess models
how does AIC/BIC inform which model to choose
the model with the lowest AIC/BIC is selected