W2 L1 phylogenetic M Flashcards

1
Q

What is phylogenetic

A
  • Estimating the evolutionary relationships among species (or genes, or both) from molecular sequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how to make a phylogenetic tree whent some animal are already extinct

A
  • Data are available only for extant species (genes)
  • What happened in the evolutionary past must be inferred (estimated) statistically
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Working sequences of extinct species

A
  • fossil give us an estimate date, useful for calibration
  • phylogenic analysis and node age reconstruction
  • diversification analysis and ancestral state reconstruction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Phylogenetic terminology

A

Closely related species are called sister species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Rooted and unrooted trees

A

Rooting gives directionality to a tree
* Without a root you cannot speak of ancestors, descendents, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Stylistic vs. meaningful differences

A

In phylogeny, the length represents the changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Different between branch and tree

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Resolution of a tree

A

Polytonal is where two or more species emerges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hard vs. soft polytomy

A

Hard polytomy occur when speciation while soft polytomy is when we don’t have enough data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Step in building a phylogeny

A
  • start from sequence alignment, the idea of having a common ancestor
    Method involve are
    Distance trees
  • Counting changes: maximum parsimony
  • Modeling evolution: maximum likelihood
  • Searching tree space
  • Bayesian inference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Distance trees

A
  • Conceptual framework:
  • Convert character data to distance matrix
  • Use distance matrix to construct dendrogram
  • Interpretation as evolutionary trees?
  • Fast (no evaluation of many trees)
  • UPGMA gives rooted, ultrametric trees
    (assumes molecular clock)
  • NJ gives unrooted trees
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Counting changes, maximum parsimony

A

The simplest explanation requiring the fewest assumptions should be preferred over more complicated hypotheses.
Translated towards phylogeny
The MP phylogeny is the one that requires the fewest evolutionary steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Modeling evolution: maximum likelihood

A

Model describes mathematically how molecular sequences evolve
Why likelihood?
* Mathematically explicit: no hidden assumptions
* Results expressed in terms of probability
* Superior performance in phylogenetics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Markov models

A

Simple stochastic process in which the distribution of future states depends only on the present state
* Simplest case: binary character with states 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is maximum likelihood favored over other model

A
  • different base frequency
  • different substitution rate
    -reversion and multiple changes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Basic elements of General Time Reversible (GTR) model

A
  • we assume that the rate of change and the reversal rate are the same as each other
    -5 model parameter instead of 12
    Base frequencies
  • 4 base frequencies
  • 3 model parameters
    [ ] F = p A p C p G p T

    πT =1−(π A + πC + πG )
17
Q

Among site variation

A
  • different site have different mutation rate from each other, this can be taken account by the model
18
Q

Likelihood computation

A

The likelihood L is the probability of the data given the model.
* The data consist of an alignment of sequences.
* The model is a hypothesis of how the data were generated.
* contain topology, branch lengths and other model parameters (base frequencies, rate matrix, gamma shape, …)
* Different topologies are evaluated one at a time.
* Each time, branch lengths and other parameter are optimized.
* Each site has a likelihood, the total likelihood is the product of all site likelihoods.
* The ML tree is the topology corresponding to the model that yielded the highest overall likelihood.
* Likelihoods are reported on a logarithmic scale for mathematical convenience: ln L