Lecture 7 Flashcards

1
Q
  1. What is Phylogenetics?
A

• Phylogenetics is the study of the evolutionary history of living organisms using tree-like diagrams to represent pedigrees of these organisms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. what is phylogeny?
A

• The tree branching patterns representing the evolutionary divergence are referred to as phylogeny.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. How do we study phylogenetics?
A
  • Fossil records
  • Molecular fossils
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. What are the different applications of phylogeny
A
  • Tree of life: Analyzing changes that have occurred in evolution of different organisms
  • Phylogenetic relationships among genes can help predict which ones might have similar functions (e.g., ortholog detection) or also paralog
  • Follow changes occurring in rapidly changing species (e.g., influenza virus)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. How can we detect ortholog?
A
  • If we get a sequence and we know to know the fuction then we would align the sequence to see their similarity
  • Ortholog first step is to align sequences and dectect their similarities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. What are the major assumptions in phylogenetics?
A
  • The molecular sequences used in phylogenetic construction are homologous —- they share a common origin and subsequently diverged through time.
  • Phylogenetic divergence is assumed to be bifurcating — a parent branch splits into 2 daughter branches at any given point.
  • Each position in a sequence evolved independently.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain each part of the phylogeny tree

  1. taxa
  2. Branches
  3. Root
  4. Internal node
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between dichotomy and polytomy and show in the picture

A
  • Dichotomy: branches bifurcate on a tree à each ancestor divides and gives rise to 2 descendants.
  • Polytomy: a branch point have more than 2 descendants, resulting in a multifurcating node
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between internal nodes and external nodes

A
  • External nodes: things under comparison; operational taxonomic units (OTUs)
  • Internal nodes: ancestral units; hypothetical; goal is to group current day units
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is tree topology

A

The banching pattern in a tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the different branching patterns seen in a tree?

A

Dichotomy

Polytomy

Unrooted phylogenetic

Rooted phylogenetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the difference between unrooted and rooted phylogenetic tree

A

• Unrooted phylogenetic tree:

does not assume knowledge of a common ancestor, but only positions the taxa to show their relative relationships.

• Rooted tree:

all sequences under study have a common ancestor or root node from which a unique evolutionary path leads to all other nodes à molecular clock hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different forms of tree presentation?

A
  1. Phylogram
  2. Cladogram
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the difference between phylogram and cladogram and their advantages

A
  • Phylogram: the branch lengths represent the amount of evolutionary divergence. (scaled tree)
  • Adv: showing both the evolutionary relationships and information about the relative divergence time of the branches.
  • Cladogram: the external taxa line up neatly in a row or column. (unscaled tree)
  • No phylogenetic meaning, only the topology of the tree matters à shows the relative ordering of the taxa.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some of the different types of phylogeny packages?

A
  1. MEGA - molecular Evolutionary genetics analysis
  2. POWER
  3. PHYLIP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. What data is used to build trees?
A
  • Traditionally: morphological features (e.g., number of legs, beak shape, etc.)
  • Today: Mostly molecular data (e.g., DNA and protein sequences) à Molecular phylogenetics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
  1. What are the two categories the data for phylogeny is classified into
A

• Can be classified into two categories:

• Numerical data
• Distance between objects
o e.g., distance (man, mouse)= 500,
o distance (man, chimp)= 100
o Usually derived from sequence data

• Discrete characters
• Each character has finite number of states
o e.g., number of legs = 1, 2, 4
o e.g., DNA = {A, C, T, G}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  1. What are the three methods used to reconstruct trees
A
  • Distance methods: evolutionary distances are computed for all OTUs and build tree where distance between OTUs “matches” these distances
  • Maximum parsimony (MP): choose tree that minimizes number of changes required to explain data
  • Maximum likelihood (ML): under a model of sequence evolution, find the tree which gives the highest likelihood of the observed data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Explain Taxa

A

current day species or sequences at the tips of the branches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Explain node

A

the connecting point where 2 adjacent branches join  represents an inferred ancestor of extant taxa.

21
Q

Explain clade

A

Monophyletic group

A group of taxa descended from a single common ancestor

22
Q

Explain lineage

A

The branch path depicting an ancestor-descendant relationship on a tree

23
Q

Explain paraphyletic

A

A number of taxa share more than one closet common anestor

24
Q

Explain molecular clock

A

an assumption by which molecular sequences evolve at constant rates so that the amount of accumulated mutations is proportional to evolutionary time. (unify the evolutionary rates)

25
Q

What are some programs that are used for phylogenetic relationships among organisms?

A
  1. Entrez
  2. Ribosomal database project
  3. Tree of life
26
Q

How to find the number of possible unrooted trees?

A

Given n OTUs, there are (2n-5) / (2n-3 (n-3) ) unrooted trees

27
Q

What is the number of possible rooted trees?

A

Given n OTUs, there are (2n-3) / (2n-2 (n-2) ) rooted trees

28
Q

Explain Parsimony.

A
  • Find tree which minimizes number of changes needed to explain data
  • tree includes hypothesis of sequence at each of the nodes
29
Q

What is the problem with parsimony? Parsimony weakness?

A

Parsimony analysis implicity assumes that rate of change along branches are similar

30
Q

What the three different distance methods?

A
  • Calculate pairwise distances (different distance measures, correction for multiple hits, correction for codon bias)
  • Make distance matrix (table of pairwise corrected distances)
  • Calculate tree from distance matrix
31
Q

How to calculate tree from a distance matrix?

A

Calculate tree from distance matrix:

  • using optimality criterion (e.g.: smallest error between distance matrix and distances in tree such as Cavalli-Sforza criterion or Fitch-Margoliash criterion: minimize), or
  • Algorithm approaches (UPGMA or neighbor joining)

Unfortunately, both lead to computationally intractable problems (e.g., enumerating)

32
Q

Explain the distance method Heuristic UPGMA

A
  • UPGMA (Unweighted Pair Group Method with Arithmetic mean)
    • Sequential clustering algorithm
    • Start with things most similar
  • Build a composite OTU
    • Distances to this OTU are computed as arithmetic means
    • From new group of OTUs, pick pair with highest similarity etc.
  • Average-linkage clustering
33
Q

What are the weaknesses of UPGMA?

A

UPGMA assumes that the rates of evolution are the same among different lineages

In general, this method should not be used for phylogenetic tree reconstruction (unless we can make the assumption)

Produces a rooted tree

34
Q

Explain the distance method: neighbor joining

A

Most widely-used distance based method for phylogenetic reconstruction

UPGMA illustrated that it is not enough to just pick closest neighbors

Key concept: consider averaged distances to other leaves as well

Produces an unrooted tree

35
Q

Explain maximum likelihood

A

Maximum likelihood (ML) evaluates the probability that the chosen evolutionary model will have generated the observed sequences.

The idea is that a history with a higher probability of reaching the observed state is preferred to a history with a lower probability

36
Q

What are the pros and cons of maximum likelihood?

A

Pro:

  • have often lower variance than other methods
  • workable even with very short sequences
  • different tree topologies evaluation
  • use all the sequence information

Con:

  • very CPU intensive —- extremely slow
  • the result is dependent on the model of evolution used
37
Q

Explain the phylogeny flowchart

A
38
Q

What is used for assessing reliability?

A

Bootstrap method

39
Q

What are the differences in methods

A

Maximum-likelihood (ML) and parsimony methods have models of evolution

Distance methods

  • frequently used as the basis for progressive and iterative types of MSA
  • Con: inability to efficiently use information about local high-variation regions that appear across multiple subtrees

Religious wars over which methods to use:

  • Most people now believe ML based methods are best:
    • most sensitive at large evolutionary distances; BUT also most time-consuming & depend on specific model of evolution used
  • Most commonly used packages contain software for all three methods: may want to use more than 1 to have confidence in built tree
40
Q

What program to use for parsimony?

A

DNApenny

Protpars

41
Q

What program to use for distance?

A

Compute distance measure using DNAdist

42
Q

What program to use for Neighbor?

A

Protdist

Neighbor joining NJ

UPGMA

43
Q

What program to use for Maximum Likelihood?

A

DNAml

44
Q

What are the steps for Neighbor joining NJ algorithm?

A
45
Q

Explain nj performance?

A

Works well in practice

If there is a tree that fits the matrix, it will find it

Can sometimes get trees with negative length edges (!)

46
Q

Explain computing distances between sequences - Jukes and cantor model

A

Jukes & Cantor model:

  • Each position in DNA sequence is independent
  • Each position can mutates

with same probability to any other base
Correction to observed substitution rate (see notes):

47
Q

What are the 2 independent assumptions made by maximum likelihood

A

Makes 2 independence assumptions

Different sites evolve independently

Diverged sequences (or species) evolve independently after diverging

48
Q

What are the two computational problems with parsimony?

A

Two computational problems

  • (Easy) Given a particular tree, how do you find minimum number of changes need to explain data? (Fitch)
  • (Hard) How do you search through all trees?
49
Q

Explain the idea of parsimony: fitch’s algorithm

A

Idea: construct set of possible nucleotides for internal nodes, based on possible assignments of children