Lecture 8 Flashcards
What happens if you have more than one “best tree”?
create a consensus tree
majority-rule consensus tree
includes clades only present in specified number of “best” trees, with the % scores at each node
Confidence in phylogenetic inferences can be thought of in 2 ways:
1) dtmn if there is meaningful signal in dataset
2) assess confidence in particular clades/topological conclusions
detecting nonrandomness in a set:
look at extent to which characters within a matrix contradict each other
Permutation Tail Prob (PTP) Test
- using any method that assigns score to individual tree (pars, ME, ML)
- compares score of shortest optimal tree to scores of trees found with random permuted data sets
parsimony PTP test
if length of shortest optimal tree is shorter than all/nearly all random trees-> data has more phylogenetic structure than would be expected from random
permutation
character states of each character independently shuffled among taxa
point estimate of phylogeny
pars, distance, and likliehood
decay index (Bremer Support)
- difference in tree length between the optimal tree and the optimal tree lacking the clade in question
- higher number means stronger support
- for likelihood: diff in log-likleihood scores (ratios)
bootstrap (general)
- assesses the chances of recovering a particular clade again if we were able to sample from a new set of characters
- simulates other possible datasets by randomly drawing from data
- informs on consistency of branching patterns
nonparametric bootstrap
- sampling with replacement (pseudoreplicate)
- tree search performed on pseudoreplicate datasets and resulting tree(s) added to optimals
- proportion of bootstrap trees with a given clade is the score
- usually presented in a bootstrap consensus tree
jackknife
same as bootstrap but sampling WITHOUT replacement
parametric bootstraping
- generates new data sets by simulating them with evolutionary model
- used mostly in ML analyses to test specific hypotheses, not clade support (ie controversial placing of sister clades)
- random seq conforming to models assumptions placed at base of tree and then allowed to evolve along branches-> repeat for all positions in seq and all branches in tree
Bayesian posterior distribution
- Bayesian not point estimate
- distribution has sample of trees ranked by prob that each is the true tree
Bayesian posterior probability (BPP)
- majority-rule of topology examined
- prob that tree is correct, assuming model is correct
- clade-credibility values (0.0-1.0)
- can be sensitive to model misspecification, use most complex model, faster
Partition homogeneity test (PHT)
- evaluates data set conflicts
- randomly assigns characters to partitions many times and then conducts a phylogenetic analysis of random partitions
- pars, ME, or ML
- similar to PTP (smaller is better)
Species tree
the relationships among spp when contrasted with gene trees; most assume gene-to-gene discordance due to incomplete lineage sorting
Minimizing deep coalescence (MDC)
- parsimony based
- search among spp trees to find topology that minimizes the total # of deep coalescence events
- shortcoming: fails to use branch length estimates from individual gene trees
Multi-species coalescent
- assumes gene coalescence always predates spp divergence
- likelihood of gene tree given a spp tree is function of gene data/spp tree, including branch lengths and widths
- L of a spp tree given set of genes is function of the summed gene likelihoods
Bayesian concordance factor (BCA)
- estimates the extent of gene discordance without assuming any one source
- measure of the prior prob of gene-to-gene discordance to estimate the prop of genome for which any clade is true (concordance factor)
- scores on tree