Phylogenetics Final Flashcards
systematics
the inference of phylogenies, the genealogy of species, focus on the species tree (reconstructing lineage history)
coalescence
point of common ancestry of two alleles
they both come from the same parental allele
lines of descent in diploid sex pop
the most recent at the top and goes back in time at the bottom
the graph is a section of the genome
the pairs a individual genes
the circles are alleles of the gene
Assumptions of diploid sex pop allele descent
- equal probabilities of alleles being passed from one gen to the next (no selection, random mating)
- population size is constant over time
- alleles (from gen t) are drawn randomly with replacement from the previous generation (t-1)
probability of coalescence
given N diploid individuals in each generation, the probability that 2 alleles coalesce in the previous generation (t-1) is 1/2N.
it is just the random probability with replacement.
time to coalescence
the expected time to coalesce for two alleles is geometrically distributed with a mean of E(t) = 2N generations. N is the number of individuals in a generation.
coalescence is slower in…
larger populations compared to smaller populations
expected time to coalescence for many alleles
4N generations
properties of coalescent trees in a constant population
- coalescence is rapid with many alleles, decreases over time as n decreases
- have long trunks and short terminal branches
recombination causes
different genes in the same individuals to have different gene tree topologies
recombination in two ways
- independent assortment of individual chromosomes (the single strand) allows chromosomes to pair with those from the other pair
- crossing over between two chromosomes allows parts of the chromosomes to swap locations. how recomb can happen on a single chromosome.
recombinational gene
a block of adjacent nucleotides that share the same gene tree
this is the idea locus for phylogenetic inference
branching or splitting
the subdivision of an ancestral population by barriers to gene flow
population tree
a tree that contains many branching populations. all of the gene trees are embedded in this tree
reasons gene trees dont match pop tree
- deep coalescence
deep coalescence
also incomplete lineage sorting
alleles fail to coalesce at their species tree point and instead have coalesced earlier in time, before the ancestral polymorphism.
ancestral polymorphism
the mutated trait that led to the split of the species
pop tree length and width
length = generations (time)
width = effective population size (idealized population with same coal. props)
combining both:
coalescent unit = 1 unit is 2N the expected time to coalescence
long and narrow, coal more likely
short and wide, coal less likely
deep coalescence frequency of gene trees
that the major (most frequent ) topology of gene trees will match the population tree and the minor (less frequent) topologies are randomly discordant and are equal in frequency.
what is a phylogeny debate
- phylogeny as a cloud: phylogeny is a statistical distribution with a central tendency but variance because of the diversity of gene trees that are all included
anomaly zone
where the population tree does not match the most probably gene tree
in pectinate tree if internal branches are very short, all 4 taxa coalesce before first split, then the 3 symmetric gene tree possibilities are more probable than the pectinate tree gene tree
due to pectinate trees only having one possible coalescence and symmetric has 2 possibilites (in a rooted 4 taxon tree)
pectinate tree (unbalanced)
tree with each taxon being individually sister to the rest of the tree
only one possible sequence of coalescence
symmetric tree (balanced)
the clades split equally
has two possible sequences of coalescence:
- one sister group coalesces and then the other
- vice versa
anomalous gene tree (AGT)
a gene tree that is more probable than the pop tree
Total evidence phylogeny estimate approach
concatentation approach
combine all of the gene data in a single matrix (therefore all the gene trees)
make a single tree from all genes to map the alleles
- results well-resolved
- assumption: all genes share same history which is unlikely
- if pop tree has one or more anomaly zones, this data is wrong
- can incorrectly support branches
Multi-species coalescent approach
inferring a population tree based on a few genes
assumption: all gene tree discordance is due to deep coalescence
3 approaches within:
1. parsimony: minimize deep coalescences
2. full-likelihood co-estimation of pop and gene trees
3. approximate or summary methods
Multiple-species coalescent Parsimony
- count minimum number of deep coalescent events on species tree required to explain gene trees
- search among candidate species trees to lower the number
drawback= no estimates of branch length or width
inconsistent when there is anomaly zone
Likelihood multiple-species coalescent
given a candidate population tree, what is the likelihood of the gene data
measure coalescent times in units of mutations per site per gen (miu)
measure populations as mutations per site (theta)
theta = 4N miu
2 sequences coalesce at 2/theta with coalescence time theta/2
Pr( G|S) = probability of a gene tree given the population tree
Single branch probability depends on parameters
number of alleles exiting and entering that branch, times of coalescent events, branch length and width
Probability that a gene evolved in population tree
Pr(Xi|S) = integral of Pr(Gi|S) prob of gene tree embedded in pop tree x Pr(Xi|G) prob of sequence data given gene tree
do summary methods instead:
unrooted quartets all have same topology (no pectinate/symmetric), large number of gene trees = pop tree topology will have highest probability
ASTRAL
takes the gene trees and decomposes them into quartets
quartets heuristically reassembled into the optimal population tree
Host-associate paradigm in 3 different contexts
trees within a tree
1. gene tree within a species tree
2. parasite cospeciating with its host
3. organisms diverging resulting from geological events
the associate tracks the host
Paralogy
gene duplication :
- one gene starts then is duplicated and each gene copy diverges
- genes may be lost resulting in incomplete complements of paralogs
may cause a species to show up multiple times in phyl in diff clades, one per paralog
if unrecognized, can lead to discordance among gene trees and to pop tree
Orthologs
genes in different organisms that serve the same function and are in the same locus
inferring phylogeny depends on accurately identifying orthologs, hard when duplications and losses are pervasive
Paralogs
genes related by duplication, having different functions and at different loci
Horizontal gene transfer
genetic transfer without mating
can happen between close or sometimes very distant relatives
(common with host and parasite)
happens rarely: will result in alternate topology that is very unlikely
Introgression
hybridization (diff close species mating) followed by back-crossing (mating with a previous generation)
primary concordance tree with a secondary concordance tree and additional very poorly supported trees
Hybrid speciation
formation of new species from equal contribution of genes from either parent from different species (closely related)
expect 50% of loci of the new species to resolve as sister group to one parent species and the other 50% sister group to the other parent species
will result in two primary concordance trees that have equal CF
- .50 and .50 all the way down the tree to the common ancestor of the two species that made the hybrid
How to assess congruence between host/associate trees?
- are tree topologies more similar than expected by chance
- measure congruence in ages of associated clades
- likelihood that they evolved on the same tree?
- do data matrices with both fail to reject common tree
- fit associate tree with host tree, accounting for some incongruence
Processes causing incongruence in host/associate trees
- duplication (speciation) and incomplete sorting of associated lineages
- host switching or/or host range expansion
- unequal and/or different rates of molecular evo in host and assoc.
- horizontal gene transfer