Molecular evolution Flashcards
How to compare AA sequences
see number of different AA between two species homologous sequences / tot AA in that sequence
Also can make phylogenetic tree by AA substitutions to see divergent time between species
Example of comparison of AA sequences
Hb alpha chain
human to horse, 18 different sites
human to shark, 79 different sites.
make a matrix
what did Linus Pauling find?
2 time Nobel prize winner
1962 - linear relationship between genomes and time since divergence, estimated from fossil record - molecular clock hypothesis
molecular clock hypothesis
rate of evolutionary change of any specified protein was approximately constant over time and over different lineages
what does evolutionary rate equal?
rate of nucleotide substitutions
how to calculate nucleotide substitution rate
r=K/2T
no. substitutions per site per year = evolutionary distance between 2 sequences/2x time of divergence between two sequences.
a simple (parsimonious) process is assumed(each nt changed only once)
how to find evolutionary distance?
P proportion of NT which have changed
define synonymous and non synonymous substitution
synonymous - a substitution which can give the same AA. likely to occur. possible since some AA are degenerate.
Non synonymous subs are unlikely to occur because they cause a biological change in the organism
what is the Ka/Ks ratio
Ks = synonymous sub rate Ka = non syn sub rate. not likely/likely usually <1 rarely>1
how do you know if positive selection has occurred
Ka/Ks>1
example of genes positively selected for
immune system genes
what does population genetics study
change of allele frequencies of population over time
who came up with the neutral theory of molecular evolution?
motoo kimura 1968
very high rate of NT substitutions must mean that most mutations are neutral in NS, and caused by random drift go alleles that are selectively neutral.
Darwinian - survival of the fittest
Kimurian - survival of the luckiest
why may some regions of genes show less substitution than others?
functional constraint
eg Hb - c chain shows higher substitution rate than other domains, because it is a non functional region
what is the opposite of random mating
assortative mating
examples of genes with high functional constraint
Hb
histones
what is an OTU
operational taxonomic unit
level of taxonomic separation used
how to calculate p distance
what is it
Number of mis-matched sites/total no. sites
Simplest estimate of evolutionary distance.
length of branches in an evolutionary tree fits the p distance of OTU pairs
what is multiple substitution
when subs occurred multiple times at the same site.
how to calculate distance for AA sees considering multiple substitution
Poisson correction PC distance = -log(1-p)
kimuras distance = -log(1-p-0.2p^2)
what is Jukes Cantor model
JC model
model for distance estimation for nucleotides.
assumes equal distance of transversions and transitions. 1 parameter method.
D = -0.75
which are trasnversions and transitions?
TRANSITIONS purine to purine = A-G, pyrimidine to pyrimidine = C-T TRANSVERSIONS purine - pyrimidine
what is kimuras model for NT distance estimation
2 parameter method
transitions are more likely than transversions
who made the first phylogenetic tree
Ernst Haekel
define (for phylogenetic tree) root leaves branch length branch internal node
root - common ancestor leaves - current species branch length- time branch - relationship between species internal node - hypothesised ancestor
difference between rooted and unrooted tree
rooted tree needs an outgroup for a reference
what is a molecular phylogeny tree
Phylogenetic tree inferred by nucleotide sequences and/or amino acid sequences
ortholog meaning
sequences diverged due to speciation
paralog meaning
sequences diverged due to gene duplication
what is advantageous about molecular phylogeny tree?
no expertise of morphology needed
universal criteria
3 methods for constructing phylogenetic tree
- distance methods - compute evolution distances for all OTUs and construct a tree which fits.
- max parsimony method - choose tree which minimises no. changes required to explain data
- max likelihood - choose tree which show highest likelihood for that evolution model.
what is UPGMA
unweighted pair group method with arithmetic mean
distance between OTUs is an average
uses sequential clustering algorithm. start with things most similar and build composite OTU. see slide 19
- make matrix of all sequences distance
- in last column, do average of those seqs.
- do average of the last column and next one
- etc
weakness of UPGMA method
assumes evolutionary rates are the same among different lineages -> if the rates are different among the lineage, the tree is wrong.
produces a rooted tree
what is the NJ method
neighbour joining
most used method
produced unrooted tree
take into account averaged distances to other leaves (neighbor) as well
Maximum parsimony method
minimised no changes needed to explain data
how is tree root determined?
- between longest route between 2 OTUs
- add data of reference gene or protein to original, reconstruct tree.
root places in branch connecting reference and others. - use other biological evidence such as gene duplication
how to check reliability of phylo tree?
- use bootstrap confidence value
- choose a random N number of sites - create pseudo alignment
- make a phylogenetic tree of this pseudo alignment
- repeat this step hundreds of times
- count how many topologies support the original one
- divide my total number of topologies made.
- gives bootstrap value
steps in phylogenetic tree analysis
- collect sequences
- multiple sequence alignment
- calculate distance
- construction of NJ tree (use Clustalx)
- expression of tree image (use NJ plot)
what is the purpose of phylogenetic tree analysis?
know the phylogenetic relationship of the species
know the evolutionary background
know the orthologous genes of the particular gene
3 outcomes of gene duplication
pseudogene - non-functionalisation
modified function - sub-fudnctionalisation
new function - neo-functionalisation
how are gene duplicates preserved?
by Neo-functionalization
gaining a new function
what is the 2R hypothesis?
vertebrates went through at least one (probably 2-round) whole genome duplications
how many genes are under positive selection?
5% of all gene families
what is molecular drive
operates independently of genetic drift and NS.
gene conversion changes genetic composition changes due to certain genes replicating more, giving no biological advantage.
what is gene recruitment
a gene gets used for a new function in evolution. the gene has pleiotropic effect