Lecture 7 Flashcards
- What is Phylogenetics?
• Phylogenetics is the study of the evolutionary history of living organisms using tree-like diagrams to represent pedigrees of these organisms.
- what is phylogeny?
• The tree branching patterns representing the evolutionary divergence are referred to as phylogeny.
- How do we study phylogenetics?
- Fossil records
- Molecular fossils
- What are the different applications of phylogeny
- Tree of life: Analyzing changes that have occurred in evolution of different organisms
- Phylogenetic relationships among genes can help predict which ones might have similar functions (e.g., ortholog detection) or also paralog
- Follow changes occurring in rapidly changing species (e.g., influenza virus)
- How can we detect ortholog?
- If we get a sequence and we know to know the fuction then we would align the sequence to see their similarity
- Ortholog first step is to align sequences and dectect their similarities
- What are the major assumptions in phylogenetics?
- The molecular sequences used in phylogenetic construction are homologous —- they share a common origin and subsequently diverged through time.
- Phylogenetic divergence is assumed to be bifurcating — a parent branch splits into 2 daughter branches at any given point.
- Each position in a sequence evolved independently.
Explain each part of the phylogeny tree
- taxa
- Branches
- Root
- Internal node
What is the difference between dichotomy and polytomy and show in the picture
- Dichotomy: branches bifurcate on a tree à each ancestor divides and gives rise to 2 descendants.
- Polytomy: a branch point have more than 2 descendants, resulting in a multifurcating node
What is the difference between internal nodes and external nodes
- External nodes: things under comparison; operational taxonomic units (OTUs)
- Internal nodes: ancestral units; hypothetical; goal is to group current day units
What is tree topology
The banching pattern in a tree
What are the different branching patterns seen in a tree?
Dichotomy
Polytomy
Unrooted phylogenetic
Rooted phylogenetic
Explain the difference between unrooted and rooted phylogenetic tree
• Unrooted phylogenetic tree:
does not assume knowledge of a common ancestor, but only positions the taxa to show their relative relationships.
• Rooted tree:
all sequences under study have a common ancestor or root node from which a unique evolutionary path leads to all other nodes à molecular clock hypothesis
What are the different forms of tree presentation?
- Phylogram
- Cladogram
Explain the difference between phylogram and cladogram and their advantages
- Phylogram: the branch lengths represent the amount of evolutionary divergence. (scaled tree)
- Adv: showing both the evolutionary relationships and information about the relative divergence time of the branches.
- Cladogram: the external taxa line up neatly in a row or column. (unscaled tree)
- No phylogenetic meaning, only the topology of the tree matters à shows the relative ordering of the taxa.
What are some of the different types of phylogeny packages?
- MEGA - molecular Evolutionary genetics analysis
- POWER
- PHYLIP
- What data is used to build trees?
- Traditionally: morphological features (e.g., number of legs, beak shape, etc.)
- Today: Mostly molecular data (e.g., DNA and protein sequences) à Molecular phylogenetics
- What are the two categories the data for phylogeny is classified into
• Can be classified into two categories:
• Numerical data
• Distance between objects
o e.g., distance (man, mouse)= 500,
o distance (man, chimp)= 100
o Usually derived from sequence data
• Discrete characters
• Each character has finite number of states
o e.g., number of legs = 1, 2, 4
o e.g., DNA = {A, C, T, G}
- What are the three methods used to reconstruct trees
- Distance methods: evolutionary distances are computed for all OTUs and build tree where distance between OTUs “matches” these distances
- Maximum parsimony (MP): choose tree that minimizes number of changes required to explain data
- Maximum likelihood (ML): under a model of sequence evolution, find the tree which gives the highest likelihood of the observed data
Explain Taxa
current day species or sequences at the tips of the branches
Explain node
the connecting point where 2 adjacent branches join represents an inferred ancestor of extant taxa.
Explain clade
Monophyletic group
A group of taxa descended from a single common ancestor
Explain lineage
The branch path depicting an ancestor-descendant relationship on a tree
Explain paraphyletic
A number of taxa share more than one closet common anestor
Explain molecular clock
an assumption by which molecular sequences evolve at constant rates so that the amount of accumulated mutations is proportional to evolutionary time. (unify the evolutionary rates)
What are some programs that are used for phylogenetic relationships among organisms?
- Entrez
- Ribosomal database project
- Tree of life
How to find the number of possible unrooted trees?
Given n OTUs, there are (2n-5) / (2n-3 (n-3) ) unrooted trees

What is the number of possible rooted trees?
Given n OTUs, there are (2n-3) / (2n-2 (n-2) ) rooted trees

Explain Parsimony.
- Find tree which minimizes number of changes needed to explain data
- tree includes hypothesis of sequence at each of the nodes
What is the problem with parsimony? Parsimony weakness?
Parsimony analysis implicity assumes that rate of change along branches are similar
What the three different distance methods?
- Calculate pairwise distances (different distance measures, correction for multiple hits, correction for codon bias)
- Make distance matrix (table of pairwise corrected distances)
- Calculate tree from distance matrix
How to calculate tree from a distance matrix?
Calculate tree from distance matrix:
- using optimality criterion (e.g.: smallest error between distance matrix and distances in tree such as Cavalli-Sforza criterion or Fitch-Margoliash criterion: minimize), or
- Algorithm approaches (UPGMA or neighbor joining)
Unfortunately, both lead to computationally intractable problems (e.g., enumerating)
Explain the distance method Heuristic UPGMA
- UPGMA (Unweighted Pair Group Method with Arithmetic mean)
- Sequential clustering algorithm
- Start with things most similar
- Build a composite OTU
- Distances to this OTU are computed as arithmetic means
- From new group of OTUs, pick pair with highest similarity etc.
- Average-linkage clustering
What are the weaknesses of UPGMA?
UPGMA assumes that the rates of evolution are the same among different lineages
In general, this method should not be used for phylogenetic tree reconstruction (unless we can make the assumption)
Produces a rooted tree
Explain the distance method: neighbor joining
Most widely-used distance based method for phylogenetic reconstruction
UPGMA illustrated that it is not enough to just pick closest neighbors
Key concept: consider averaged distances to other leaves as well
Produces an unrooted tree
Explain maximum likelihood
Maximum likelihood (ML) evaluates the probability that the chosen evolutionary model will have generated the observed sequences.
The idea is that a history with a higher probability of reaching the observed state is preferred to a history with a lower probability
What are the pros and cons of maximum likelihood?
Pro:
- have often lower variance than other methods
- workable even with very short sequences
- different tree topologies evaluation
- use all the sequence information
Con:
- very CPU intensive —- extremely slow
- the result is dependent on the model of evolution used
Explain the phylogeny flowchart
What is used for assessing reliability?
Bootstrap method
What are the differences in methods
Maximum-likelihood (ML) and parsimony methods have models of evolution
Distance methods
- frequently used as the basis for progressive and iterative types of MSA
- Con: inability to efficiently use information about local high-variation regions that appear across multiple subtrees
Religious wars over which methods to use:
- Most people now believe ML based methods are best:
- most sensitive at large evolutionary distances; BUT also most time-consuming & depend on specific model of evolution used
- Most commonly used packages contain software for all three methods: may want to use more than 1 to have confidence in built tree
What program to use for parsimony?
DNApenny
Protpars
What program to use for distance?
Compute distance measure using DNAdist
What program to use for Neighbor?
Protdist
Neighbor joining NJ
UPGMA
What program to use for Maximum Likelihood?
DNAml
What are the steps for Neighbor joining NJ algorithm?
Explain nj performance?
Works well in practice
If there is a tree that fits the matrix, it will find it
Can sometimes get trees with negative length edges (!)
Explain computing distances between sequences - Jukes and cantor model
Jukes & Cantor model:
- Each position in DNA sequence is independent
- Each position can mutates
with same probability to any other base
Correction to observed substitution rate (see notes):
What are the 2 independent assumptions made by maximum likelihood
Makes 2 independence assumptions
Different sites evolve independently
Diverged sequences (or species) evolve independently after diverging
What are the two computational problems with parsimony?
Two computational problems
- (Easy) Given a particular tree, how do you find minimum number of changes need to explain data? (Fitch)
- (Hard) How do you search through all trees?
Explain the idea of parsimony: fitch’s algorithm
Idea: construct set of possible nucleotides for internal nodes, based on possible assignments of children