- goal: need to estimate phylogeny. - approach: 1. collect data (evidence) 2. align data (homology) 3. find the "best" tree (phylogenetic analysis)

calculating support for relationships. how strong is the signal in the data? - statistical procedure: random re-sampling of the data, with replacement. - pseudoreplicates

what is the probability of observing a set of data given a hypothesis? the equation of the conditional probability is: likelihood = probability (data hypothesis) or P(data tree). - calculate the likelihood for each hypothesis. look for the hypothesis with the highest likelihood. - propose new topology, branch lengths, model parameters -> calculate scores; repeat.

lecture 3 Flashcards by lisa k

cladograms, phylograms, and chronograms have

branch lengths that contain useful information.

How well did you know this?

Not at all

Perfectly

phylogenetic methods

goal: need to estimate phylogeny.
approach:
1. collect data (evidence)
2. align data (homology)
3. find the “best” tree (phylogenetic analysis)

How well did you know this?

Not at all

Perfectly

types of data

morphology and molecular.

How well did you know this?

Not at all

Perfectly

phylogenetic methods

distance-based methods, maximum parsimony, maximum likelihood, and Bayesian inference.

How well did you know this?

Not at all

Perfectly

distance based methods steps

step 1: convert data to a measure of genetic distance between each pair of sequences.
step 2: calculate a tree using the table of genetic disorders.

How well did you know this?

Not at all

Perfectly

distance based methods process

sequence alignment
distances between sequences -> distance tables
calculate unrooted tree

How well did you know this?

Not at all

Perfectly

distanced based methods pros

ultra-fast and can handle large number of species.

How well did you know this?

Not at all

Perfectly

distance based methods cons

replaces sequences with distances, has problems when distances are tied, and no model of evolution.

How well did you know this?

Not at all

Perfectly

maximum parsimony

find the tree that explains the observed data with a minimal number of changes. choosing the tree with the least steps or minimal number of substitutions.

How well did you know this?

Not at all

Perfectly

maximum parsimony steps

step 1: propose a tree and compute parsimony score (least number of changes for a given tree).
step 2: repeat step 1 until the tree with the minimum number of changes is found.
changes: informative vs uninformative

How well did you know this?

Not at all

Perfectly

maximum parsimony pros

simple to understand.

How well did you know this?

Not at all

Perfectly

maximum parsimony cons

easily trapped on local max, no branch lengths, too many equally parsimonious trees, statistically inconsistent (fails to find the correct tree; more data only increases support for incorrect result).

How well did you know this?

Not at all

Perfectly

bootstrap

calculating support for relationships. how strong is the signal in the data?

statistical procedure: random re-sampling of the data, with replacement.
pseudoreplicates

How well did you know this?

Not at all

Perfectly

maximum likelihood

what is the probability of observing a set of data given a hypothesis? the equation of the conditional probability is: likelihood = probability (data | hypothesis) or P(data | tree).

calculate the likelihood for each hypothesis. look for the hypothesis with the highest likelihood.
propose new topology, branch lengths, model parameters -> calculate scores; repeat.

How well did you know this?

Not at all

Perfectly

substitutions model (max. likelihood)

model the rates of changes among base pairs.

1 rate: all changes equal
2 rates: transitions vs. transversions
6 rates: unique rates - general time - reversible model

How well did you know this?

Not at all

Perfectly

maximum likelihood pros

Study These Flashcards

statistically consistent (guaranteed accuracy with sufficient data).

maximum likelihood cons

Study These Flashcards

slow, “hill climbing,” which makes it easy to be trapped on a local max.

hill climbing

Study These Flashcards

maximum parsimony and maximum likelihood objective is to find the ““maximum” solution.
- propose new: topology (ML and MP), branch lengths (ML only), model parameters (ML only) -> calculate score; repeat.

is there really an optimal tree?

Study These Flashcards

can’t enumerate all possible trees for 10+ species.

Bayesian phylogenetics

Study These Flashcards

instead of trying to find the “maximum” solution, we summarize the entire distribution using simulation called Markov Chain Monte Carlo (MCMC).

MP and ML vs Bayesian

Study These Flashcards

provides a single “point estimation” vs provides a probability distribution
requires bootstrapping vs probabilities are provided
can get trapped on local max vs can escape local max
no uncertainty in estimates vs estimate uncertainty for each parameter.

Baye’s theorem

Study These Flashcards

Pr(tree | data) = (Pr(data | tree) * Pr(tree)) / Pr(data)

posterior probability = likelihood * prior probability

marginal likelihood

Study These Flashcards

summation over all trees and, for each tree, integration over all possible combinations of branch length and substitution model parameter values.

prior distribution

Study These Flashcards

probability assumed before observing data.

likelihood

L = P(data | tree), same as one used in ML.

posterior distribution

combination of the prior and likelihood.

how do you put a prior on a phylogenetic tree?

give all trees equal prior probability.

Start at a random location and follow these two MCMC rules:

1. if the proposed step takes you uphill, you automatically take the step. 2. if the proposed step takes you downhill, you take the step at "random." - approximates the posterior distribution.

1. convergence - how long to run the MCMC analysis?

- are you sampling from the "global peak"? | - have you collected enough samples to summarize?

2. burn in - how many of the initial samples are useless?

- you being at a random spot, when did you reach the "goal"?

tree space

each dot is a tree visited during the MCMC. the number of times a tree is visited is proportional to the probability of the tree.

phylogenetic methods:

- distance: convert data to distance values and calculate the tree. - maximum parsimony: the tree that explains the data with the least amount of evolutionary change. - maximum likelihood: the tree with the highest probability of generating the observed data. - bayesian - a probability distribution of trees based on prior knowledge and current data.

lecture 3 Flashcards

(32 cards)