W2 L3 Phylogeny P3 TH Flashcards

1
Q

Multiple sequence alignment

A

Goal: determine character homology
* Practical: insert gaps to make sequences line up
(hypotheses about site homologies resulting from historical insertion-deletion events)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Aligning protein-coding sequences

A

Various algorithms to remove ambiguous sites, e.g.
- SOAP: align using multiple settings
- GBLOCKS: score for contiguous conserved positions, lack of gaps, conserved flanking positions
- BMGE: gap frequencies, entropy
- TrimAl: gap frequencies, AA similarity, consistency across alignments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Model selection

A

-all model are wrong but somewhat useful. Trade off of overfitting and under fitting
-by adding parameters to a model, it would lead to improvement in fit but also increase variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Aiming to make model fit

A

DNA sequences of extant species reflect the evolutionary processes that have acted on them
* Parameters of the model of sequence evolution specify in a statistical way how past changes have led to the present diversity of DNA sequences
* Not all genes and groups of organisms have had the same history
* Smart to evaluate many models and use one that seems fit to be used with the dataset in question
* Aim: identify model that yields good trade-off between the fit of the data to the model and the number of parameters that need to be fitted
* Fit can be measured with log-likelihood
* Score many models (bonus for good fit, penalty for many parameters)
* Pick the one with best score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Akaike Information Criterion (and related)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Other methods for model selection

A
  • Decision theory
    penalty for models that yield branch lengths deviating from those of other methods in the comparison
  • Bayesian model selectionpairwise model comparison with Bayes Factor
  • Likelihood ratio tests (hLRT)much used in the past, serious over parameterization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Protein models

A

Empirical transition matrices
* These are estimated from large databases beforehand and implemented in software
* Many different matrices available, use model testing procedure to select suitable one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Phylogenetic uncertainty

A

Phylogenetic tree is often seen as a point estimate, as single best result
* However, every result has a “standard deviation”
* some relationships better supported by data than others
* often several trees with nearly identical likelihood
* Could thus be informative to obtain statistics indicating support for relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

bootsrapping

A

Resampling data set with replacement until the same alignment length is reachEd
-run a ML for the bootstrap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interpreting branch support

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Outgroup rooting

A

Include related taxon or taxa from outside the group of interest
Outgroup choice often treated as an afterthought, but should be part of experimental design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Alternative rooting methods

A
  • Molecular clock model (strict or relaxed)
  • Midpoint rooting
  • Non-reversible models
    No need for outgroups: tree is automatically rooted
    Accuracy often (but not always) lower
    If unsure, compare different methods and choose the method of which the assumptions are least likely to be violated for the dataset being studied
How well did you know this?
1
Not at all
2
3
4
5
Perfectly