Lecture 3 – Molecular phylogenetics Flashcards
what is phylogenetics?
reconstructing patterns of shared ancestry between organisms, either among or within a species
what is taxonomy
describing, naming, and classifying species- the organisation of organisms based on phylogenetic/other info
orthologous sequence
from different species- used to look at speciation, extinction etc
homologous sequences
from the same species- can be used to look at population genetics
paralogous sequences
different genes within the same genome- can be used to look at gene duplication, deletion etc
advantages of using molecular characters ratjer than morphological ones
-they are more objective and therefore easier to quantify
-available even when morphology is uninformative, e.g. for microorganisms
-cheap and fast
-don’t require specialist training to obtain
disadvantage of using molecular characters rather than morphological ones
can’t be used to look at extinct species, most of the time- leaves gaps in phylogenies etc
different types of SNP
transition- purine-purine etc
transversion- purine-pyrimidine or vice versa
what is sequence alignment
alignment of multiple genetic sequences based on positional homology- the idea that there will be conserved sequences at set positions
how computer programs are useful in sequence alignment
there are multiple possible alignments for each 2 sequences, so algorithms are useful to determine the likelihood that each alignment is the correct one
things that can complicate alignment
long indels, a lot of genetic diversity
examples of programs used for alignment
clustal and muscle
what is p-distance
proportion of mismatched sites, very simple measure of genetic difference
what is the multiple hits problem
once you get to high observed genetic changes, the actual number of changes is probably higher- points where there have been multiple substitutions can go undetected
things that help solve the mhp
generating nucleotide substitution models, so you can project the observed distance onto the likely actual distance
tools useful in nucleotide substitution models
Jukes-Cantor model, which looks at appropriate nucleotide substitutions
amino acid substitution models- useful things
JTT matrix- looks at the actual frequencies of substitutions, with rates obtained from a large survey of protein variation
assumptions within JTT matrix
evolution at each site occurs at the same rate
nucleotide base species are always the same for all species
evolution at each site is independent- can’t really avoid this one, but sometimes it isn’t true, e.g. if there are secondary nucleic acid structures
how can among-site variation be accounted for>
gamma distribution model- this can model the heterogeneity in site evolution in a fairly accurate way, helping to create a more accurate level of change- genetic distances tend to be higher using these models
what are boostrap values on a phylogenetic tree
measure of phylogenetic uncertainty
rooted vs unrooted tree
rooted has an evolutionary direction, and only horizontal lines represent genetic distance
unrooted tree- no direction, and all lines represent genetic distance
algorithmic methods- how it works, example
genetic distances for each pair are ‘clustered’- e.g. neighbour-joining
optimality methods-how it works, example
score to all possible trees based on data, and an optimisation algorithm finds the highest scores. maximum parsimony, maximum likelihood, bayesian inference
statistical methods- how it works, example
probability for each possible tree- more of a formal statistical problem. maximum likelihood, bayesian inference
maximum parsimony tree- principle
tree which requires the fewest evolutionary changes is the best one, fast but not good for high divergence
maximum likelihood tree- principle
finds the tree which is most likely to have led to finding the observed species using nucleotide substitution models
bayesian inference- principle
looks at probability distribution, rather than the probability of individual trees- similar to max likelihood
what is a parsinomy score
minimum number of evolutionary changes required to explain observed characters- the scores can be added together on a tree
what is a ‘hill climbing’ method?
searches through trees using trial and error, but doesn’t check through all trees- just ones that may get closer to the optimum