W2 L3 Phylogeny P3 TH Flashcards

Question 1

Q

Multiple sequence alignment

Answer

A

Goal: determine character homology
* Practical: insert gaps to make sequences line up
(hypotheses about site homologies resulting from historical insertion-deletion events)

Question 2

Q

Aligning protein-coding sequences

Answer

A

Various algorithms to remove ambiguous sites, e.g.
- SOAP: align using multiple settings
- GBLOCKS: score for contiguous conserved positions, lack of gaps, conserved flanking positions
- BMGE: gap frequencies, entropy
- TrimAl: gap frequencies, AA similarity, consistency across alignments

Question 3

Q

Model selection

Answer

A

-all model are wrong but somewhat useful. Trade off of overfitting and under fitting
-by adding parameters to a model, it would lead to improvement in fit but also increase variance

Question 4

Q

Aiming to make model fit

Answer

A

DNA sequences of extant species reflect the evolutionary processes that have acted on them
* Parameters of the model of sequence evolution specify in a statistical way how past changes have led to the present diversity of DNA sequences
* Not all genes and groups of organisms have had the same history
* Smart to evaluate many models and use one that seems fit to be used with the dataset in question
* Aim: identify model that yields good trade-off between the fit of the data to the model and the number of parameters that need to be fitted
* Fit can be measured with log-likelihood
* Score many models (bonus for good fit, penalty for many parameters)
* Pick the one with best score

Question 5

Q

Akaike Information Criterion (and related)

Question 6

Q

Other methods for model selection

Answer

A

Decision theory
penalty for models that yield branch lengths deviating from those of other methods in the comparison
Bayesian model selectionpairwise model comparison with Bayes Factor
Likelihood ratio tests (hLRT)much used in the past, serious over parameterization

Question 7

Q

Protein models

Answer

A

Empirical transition matrices
* These are estimated from large databases beforehand and implemented in software
* Many different matrices available, use model testing procedure to select suitable one

Question 8

Q

Phylogenetic uncertainty

Answer

A

Phylogenetic tree is often seen as a point estimate, as single best result
* However, every result has a “standard deviation”
* some relationships better supported by data than others
* often several trees with nearly identical likelihood
* Could thus be informative to obtain statistics indicating support for relationships

Question 9

Q

bootsrapping

Answer

A

Resampling data set with replacement until the same alignment length is reachEd
-run a ML for the bootstrap

Question 10

Q

Interpreting branch support

Question 11

Q

Outgroup rooting

Answer

A

Include related taxon or taxa from outside the group of interest
Outgroup choice often treated as an afterthought, but should be part of experimental design

Question 12

Q

Alternative rooting methods

Answer

A

Molecular clock model (strict or relaxed)
Midpoint rooting
Non-reversible models
No need for outgroups: tree is automatically rooted
Accuracy often (but not always) lower
If unsure, compare different methods and choose the method of which the assumptions are least likely to be violated for the dataset being studied