Phylogenetics Flashcards

1
Q

What is a phylogenetic tree and what are its elements?

A
  • Is a kind of graph used to model the evolutionary history of a group of either sequences or organisms
  • Phylogeny = evolutionary tree = the actual pattern of historical relationships
  • Tree consists of nodes connected by branches (=links = edges)
  • Shape of tree = topology
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the components of a tree (clade, length of branches, root, outgroup)

A
  • Clade (haplogroup) - group
  • Length of branches corresponds to amount of change - e.g., more mutations
  • Root = most recent common ancestor (MRCA)
  • Outgroup - phenotype we have selected that we know will fall outside
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a rooted tree?

A

A tree with a root which is the most recent common ancestor (MRCA) of the sequences or species in the tree
- Root is oldest part of tree - all other nodes descend from it - gives directionality
- So in rooted trees we can talk about ancestors and descendents, or ancestral and derived character states

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an outgroup?

A

A taxon which we know diverged before the MRCA of the group under consideration (the ingroup)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a clade?

A

A monophyletic group of sequences or organisms
- Groups which include all descendents of a single ancestor
- Some groups are not clades - termed ‘paraphyletic’ - e.g., reptiles (clade should include birds) and apes (clade should include humans)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are monophyletic and paraphyletic groups?

A
  • Monophyletic: taxonomic group that contains all descendents of the MRCA of that group - the group is defined by shared, dervied features
  • Paraphyletic: taxonomic group that contains some, but not all, descendents of the MRCA of that group - grouping is derived from characteristic that isn’t fully shared by ancestors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you do phylogenetic reconstruction and what is the exception?

A
  • Phylogenetic reconstruction entails making best estimate of evolutionary historical relationships among entities - using available modern data
  • With exception of ‘molecular fossils’ and ancient preserved DNA, we do not have direct access to information about the past, but make inferences from molecules and other characteristics of extant taxa
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What two separate functions are required for phylogenetics?

A
  • Making phylogenies
  • Choosing which are ‘best’ among large number of possible phylogenies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What types of data can we use for phylogenetics?

A

Discrete characters:
- Independent variables who possible values are collections of mutually exclusive character states
- e.g., nucleotide position in n in DNA region x
- Can be qualitative or quantitative

Distances or similarities:
- Complex multi-variable dataset of differences among taxa combined into a composite measure, expressed as a single value
- n x n matrix is made
- More computationally efficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What assumptions are made for discrete data?

A
  • Independence - i.e. that characters state is not affected by that of another character
  • Homology - i.e. that the character in taxa being analysed are genuinely equivalent in evolutionary terms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two main phylogenetic methods

A

Optimality criteria methods: application of a definition of the preffered tree - compares tree to optimality criterion to find best tree and ranks them
- Maximum parsimony (MP)
- Maximum likelihood
- Bayesian (MCMC sampling)

Algorithmic: set of instructions for how to go about making a phylogenetic tree - rolls together how to make trees and the definition of a preferred tree
- Clustering algorithms
- UPGMA
- Neighbour-joining (NJ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you rank trees with the two phylogenetic methods?

A

Criterion-based:
- Rank in order of fit to the criteria and choose best
- Can quantitatively say how much better one topology is compared to an alternative

Algorithmic:
- Many trees that are equally likely and a simple algorithmic method (like NJ) will still give just one tree as result
- So is hard to say how much better one topology is over another
- Can use resampling statistics (e.g., bootstrapping) to indicate what extent the topology is supported by the data - but do not overcome this weakness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two different algorithms of finding the best tree with optimality criterion methods?

A

Exact algorithms:
- ‘Guarantee’ to find optimal tree by using exhaustive seach to find best score
- Or use branch and bound search - eliminates part of tree that only contain suboptimal solutions
- But if too many trees - will take a very long time and lots of computational power

Heuristic algorithms:
- Approximate or quick and dirty methods to attempt to find optimal tree for method of choice - but cannot guarantee
- Often use ‘hill-climbing’ methods - but local peak may not be the highest mountain (optimal tree) - so is repeated many times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is homoplasy?

A

Multiple mutational hits on the same site results in alleles identical in state, but not by descent
- Causes divergence rate to decline overtime rather than follow a linear ‘molecular clock’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What model allows for the possibility of homoplasy?

A

Substitution model:
- Can be used to try and allow for the effect of homoplasy and estimate the true evolutionary distance from the observed distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does UPGMA cluster analysis work and what assumption does it make?

A

The most similar pair of taxa is found - i.e. the two taxa separated by the smallest genetic distance
- Makes assumption of ultrametricity - i.e. that two of the three pairwise distances among three taxa are equal and at least as large as the third which involves the assumption that rates of evolution cannot vary among taxa
- But this assumption is nearly ALWAYS violated

17
Q

How does neighbour joining cluster algorithm work?

A

Is like cluster analysis but allows for variation in evolutionary rates along different branches
- Doesn’t require assumption of ultrametricity
- It does assume ‘multiple hits’ of the same nucleotide sites

18
Q

What is bootstrapping?

A

Resampling of data to get a feel for how well the topology is supported by the data
- i.e. data are resampled many times and reconstructed - asking if the same tree would have been generated if some of the sequence information had not been obtained, or if all the taxa had not been sampled
- Bootstrap values show number of time and indicate 100% of resampled replicated which include the observed clade in the original tree

19
Q

What are the problems with bootstrapping?

A
  • Bootstrap proportions aren’t easily interpretable
  • No indication for how good the data are but simply for how well the tree fits the data
  • They dont relate to an evolutionary model - more about how well the tree is supported by the data - rather than assessing how good the tree is
20
Q

Compare the algorithmic and optimality criteron based approaches

A

Algorithmic:
- Combines the definition of the ‘best tree’ and how to generate the tree topology

Optimality criteron:
- Separate definition of the best tree and generation of topologies
- Topologies can be quantitatively compared against the definition, and ranked based on a score as to how well they fit the definition

21
Q

What are parsimony approaches?

A

Type of optimality criteria method
- Works directly with character data
- Makes assumption that the trees requiring the smallest number of character state changes that can explain the data are preferred
- So - less changes are more parsimonious than complexed sequence changes

22
Q

What are the positives /problems with parsimony?

A
  • Parsimony methods do not use an explicit model of evolutionary change - parsimony doesn’t make complicated assumptions
  • Whereas if using a complicated evolutionary model - they will have assumptions that you may violate which can cause problems
23
Q

How do the maximum likelihood (ML) and Bayesian methods work?

A

These methods specify a model of molecular evolution and build this into the best estimate of the tree
- ML looks for the tree that maximises likelihood of observing the data, given the tree and the model
- Bayesian - seeks the tree that maximises the probability of the tree, given the data and the model

24
Q

What are the advantages of ML and Bayesian methods over parsimony?

A
  • Major advantage - they use information on branch lengths to determine on which branches certain molecular changes are most likely to have occurred
  • Parsimony methods do not use this information
25
Q

What is the objective and the 3 elements of a ML model?

A

Objective: to infer the evolutionary history that is most consistent with a set of observed data - evaluate a hypothesis about evolutionary history

3 Elements:
1. Data = observed DNA (or protein) sequences
2. Unknowns = branching order and branch lengths of the evolutionary tree
3. Mutation model = model that accounts for the conversion of one sequence to another

  • Find unknowns - (i.e. the tree topology) - that maximises the probability of observing the data - and rank topologies based on ML scores
26
Q

What simple evolution model can be used for ML approaches?

A

Jukes-Cantor model
- Any nucleotide can change to any other nucleotide with equal probability
- Therefore - no. of mutations is proportionaly to length of branches and the substitution rate
- So, we assume the rate is the same throughout the tree

27
Q

What are the general advantages/disadvantages of ML/Bayesian methods?

A

Advantages:
- Consistent
- Lower variance than other methods - less affected by sampling error
- Tends to out-perform parsimony and additive distance methods even for short sequences under a broad range of evolutionary models

Disadvantages:
- Problems getting the model right
- Very computationally demanding - Bayesian method of choice - is more tractable computationally
- Critical that you use appropriate sequence evolution model

28
Q

Describe the case study for the origin of cetaceans

A

Origins of cetaceans: McGowan et al., 2009
- Time calibrated phylogenies allowed links to be made between diversification events and driving external factors such as environmental change - allows you to be able to tell what changes in the past caused pattern of evolution
- Time scale is inferred based on changes in braches along with a molecular clock to determine how long they would take to occur

Found that:
- Using isotopes of oxygen and ice and sediment cored to estimate ancient temps and ecological productivity - to reconstruct environmental variation
- Split of odontoceti (toothed whales) and mysticeti (baleen whales) close to onset of global glaciations 34 Mya - suggests polar ice sheets created new niches for filter feeding whales

29
Q

Describe the case study for myoglobins in sea mammals

A

Myoglobins - oxygen carrying protein - occurs at higher concentrations and have higher surface charge in diving mammals - allows more myoglibin to be packed into muscles, increasing oxygen storage capacity - physiological adaptation contributing to evolution of diving ability in marine mammals
- Mirceta et al., 2013
- Caused by convergent evolution - can see that increase in surface charge independently evolved through each lineage of aquatic mammal - separate selection pressures in each lineage
- Can then infer the position of marine mammal ancestor species in the phylogeny and estimate what their myoglobin properties must have been - estimate ancient protein coding sequences
- Found 3 key mutations from protein coding sequences in Basiolsaurus
- Then even use this to estimate maximum dive times of long extinct species based on ancient protein coding sequences
- Shows powerful way to reconstruct phenotypes of ancestral lineages - and how they are likely to have changed over time - even more powerful when combined with paleoenvironmental data - that would have caused exploitation of phenotypes due to environmental pressures