Phylogenetics Flashcards
What is a phylogenetic tree and what are its elements?
- Is a kind of graph used to model the evolutionary history of a group of either sequences or organisms
- Phylogeny = evolutionary tree = the actual pattern of historical relationships
- Tree consists of nodes connected by branches (=links = edges)
- Shape of tree = topology
Explain the components of a tree (clade, length of branches, root, outgroup)
- Clade (haplogroup) - group
- Length of branches corresponds to amount of change - e.g., more mutations
- Root = most recent common ancestor (MRCA)
- Outgroup - phenotype we have selected that we know will fall outside
What is a rooted tree?
A tree with a root which is the most recent common ancestor (MRCA) of the sequences or species in the tree
- Root is oldest part of tree - all other nodes descend from it - gives directionality
- So in rooted trees we can talk about ancestors and descendents, or ancestral and derived character states
What is an outgroup?
A taxon which we know diverged before the MRCA of the group under consideration (the ingroup)
What is a clade?
A monophyletic group of sequences or organisms
- Groups which include all descendents of a single ancestor
- Some groups are not clades - termed ‘paraphyletic’ - e.g., reptiles (clade should include birds) and apes (clade should include humans)
What are monophyletic and paraphyletic groups?
- Monophyletic: taxonomic group that contains all descendents of the MRCA of that group - the group is defined by shared, dervied features
- Paraphyletic: taxonomic group that contains some, but not all, descendents of the MRCA of that group - grouping is derived from characteristic that isn’t fully shared by ancestors
How can you do phylogenetic reconstruction and what is the exception?
- Phylogenetic reconstruction entails making best estimate of evolutionary historical relationships among entities - using available modern data
- With exception of ‘molecular fossils’ and ancient preserved DNA, we do not have direct access to information about the past, but make inferences from molecules and other characteristics of extant taxa
What two separate functions are required for phylogenetics?
- Making phylogenies
- Choosing which are ‘best’ among large number of possible phylogenies
What types of data can we use for phylogenetics?
Discrete characters:
- Independent variables who possible values are collections of mutually exclusive character states
- e.g., nucleotide position in n in DNA region x
- Can be qualitative or quantitative
Distances or similarities:
- Complex multi-variable dataset of differences among taxa combined into a composite measure, expressed as a single value
- n x n matrix is made
- More computationally efficient
What assumptions are made for discrete data?
- Independence - i.e. that characters state is not affected by that of another character
- Homology - i.e. that the character in taxa being analysed are genuinely equivalent in evolutionary terms
What are the two main phylogenetic methods
Optimality criteria methods: application of a definition of the preffered tree - compares tree to optimality criterion to find best tree and ranks them
- Maximum parsimony (MP)
- Maximum likelihood
- Bayesian (MCMC sampling)
Algorithmic: set of instructions for how to go about making a phylogenetic tree - rolls together how to make trees and the definition of a preferred tree
- Clustering algorithms
- UPGMA
- Neighbour-joining (NJ)
How do you rank trees with the two phylogenetic methods?
Criterion-based:
- Rank in order of fit to the criteria and choose best
- Can quantitatively say how much better one topology is compared to an alternative
Algorithmic:
- Many trees that are equally likely and a simple algorithmic method (like NJ) will still give just one tree as result
- So is hard to say how much better one topology is over another
- Can use resampling statistics (e.g., bootstrapping) to indicate what extent the topology is supported by the data - but do not overcome this weakness
What are the two different algorithms of finding the best tree with optimality criterion methods?
Exact algorithms:
- ‘Guarantee’ to find optimal tree by using exhaustive seach to find best score
- Or use branch and bound search - eliminates part of tree that only contain suboptimal solutions
- But if too many trees - will take a very long time and lots of computational power
Heuristic algorithms:
- Approximate or quick and dirty methods to attempt to find optimal tree for method of choice - but cannot guarantee
- Often use ‘hill-climbing’ methods - but local peak may not be the highest mountain (optimal tree) - so is repeated many times
What is homoplasy?
Multiple mutational hits on the same site results in alleles identical in state, but not by descent
- Causes divergence rate to decline overtime rather than follow a linear ‘molecular clock’
What model allows for the possibility of homoplasy?
Substitution model:
- Can be used to try and allow for the effect of homoplasy and estimate the true evolutionary distance from the observed distance