lecture 3 Flashcards

1
Q

cladograms, phylograms, and chronograms have

A

branch lengths that contain useful information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

phylogenetic methods

A
  • goal: need to estimate phylogeny.
  • approach:
    1. collect data (evidence)
    2. align data (homology)
    3. find the “best” tree (phylogenetic analysis)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

types of data

A

morphology and molecular.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

phylogenetic methods

A

distance-based methods, maximum parsimony, maximum likelihood, and Bayesian inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

distance based methods steps

A
  • step 1: convert data to a measure of genetic distance between each pair of sequences.
  • step 2: calculate a tree using the table of genetic disorders.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

distance based methods process

A
  1. sequence alignment
  2. distances between sequences -> distance tables
  3. calculate unrooted tree
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

distanced based methods pros

A

ultra-fast and can handle large number of species.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

distance based methods cons

A

replaces sequences with distances, has problems when distances are tied, and no model of evolution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

maximum parsimony

A

find the tree that explains the observed data with a minimal number of changes. choosing the tree with the least steps or minimal number of substitutions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

maximum parsimony steps

A
  • step 1: propose a tree and compute parsimony score (least number of changes for a given tree).
  • step 2: repeat step 1 until the tree with the minimum number of changes is found.
  • changes: informative vs uninformative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

maximum parsimony pros

A

simple to understand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

maximum parsimony cons

A

easily trapped on local max, no branch lengths, too many equally parsimonious trees, statistically inconsistent (fails to find the correct tree; more data only increases support for incorrect result).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

bootstrap

A

calculating support for relationships. how strong is the signal in the data?

  • statistical procedure: random re-sampling of the data, with replacement.
  • pseudoreplicates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

maximum likelihood

A

what is the probability of observing a set of data given a hypothesis? the equation of the conditional probability is: likelihood = probability (data | hypothesis) or P(data | tree).

  • calculate the likelihood for each hypothesis. look for the hypothesis with the highest likelihood.
  • propose new topology, branch lengths, model parameters -> calculate scores; repeat.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

substitutions model (max. likelihood)

A

model the rates of changes among base pairs.

  • 1 rate: all changes equal
  • 2 rates: transitions vs. transversions
  • 6 rates: unique rates - general time - reversible model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

maximum likelihood pros

A

statistically consistent (guaranteed accuracy with sufficient data).

17
Q

maximum likelihood cons

A

slow, “hill climbing,” which makes it easy to be trapped on a local max.

18
Q

hill climbing

A

maximum parsimony and maximum likelihood objective is to find the ““maximum” solution.
- propose new: topology (ML and MP), branch lengths (ML only), model parameters (ML only) -> calculate score; repeat.

19
Q

is there really an optimal tree?

A

can’t enumerate all possible trees for 10+ species.

20
Q

Bayesian phylogenetics

A

instead of trying to find the “maximum” solution, we summarize the entire distribution using simulation called Markov Chain Monte Carlo (MCMC).

21
Q

MP and ML vs Bayesian

A
  • provides a single “point estimation” vs provides a probability distribution
  • requires bootstrapping vs probabilities are provided
  • can get trapped on local max vs can escape local max
  • no uncertainty in estimates vs estimate uncertainty for each parameter.
22
Q

Baye’s theorem

A

Pr(tree | data) = (Pr(data | tree) * Pr(tree)) / Pr(data)

posterior probability = likelihood * prior probability

23
Q

marginal likelihood

A

summation over all trees and, for each tree, integration over all possible combinations of branch length and substitution model parameter values.

24
Q

prior distribution

A

probability assumed before observing data.

25
Q

likelihood

A

L = P(data | tree), same as one used in ML.

26
Q

posterior distribution

A

combination of the prior and likelihood.

27
Q

how do you put a prior on a phylogenetic tree?

A

give all trees equal prior probability.

28
Q

Start at a random location and follow these two MCMC rules:

A
  1. if the proposed step takes you uphill, you automatically take the step.
  2. if the proposed step takes you downhill, you take the step at “random.”
    - approximates the posterior distribution.
29
Q
  1. convergence - how long to run the MCMC analysis?
A
  • are you sampling from the “global peak”?

- have you collected enough samples to summarize?

30
Q
  1. burn in - how many of the initial samples are useless?
A
  • you being at a random spot, when did you reach the “goal”?
31
Q

tree space

A

each dot is a tree visited during the MCMC. the number of times a tree is visited is proportional to the probability of the tree.

32
Q

phylogenetic methods:

A
  • distance: convert data to distance values and calculate the tree.
  • maximum parsimony: the tree that explains the data with the least amount of evolutionary change.
  • maximum likelihood: the tree with the highest probability of generating the observed data.
  • bayesian - a probability distribution of trees based on prior knowledge and current data.