Phylogenetics Final Flashcards

1
Q

systematics

A

the inference of phylogenies, the genealogy of species, focus on the species tree (reconstructing lineage history)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

coalescence

A

point of common ancestry of two alleles
they both come from the same parental allele

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

lines of descent in diploid sex pop

A

the most recent at the top and goes back in time at the bottom
the graph is a section of the genome
the pairs a individual genes
the circles are alleles of the gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assumptions of diploid sex pop allele descent

A
  • equal probabilities of alleles being passed from one gen to the next (no selection, random mating)
  • population size is constant over time
  • alleles (from gen t) are drawn randomly with replacement from the previous generation (t-1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

probability of coalescence

A

given N diploid individuals in each generation, the probability that 2 alleles coalesce in the previous generation (t-1) is 1/2N.
it is just the random probability with replacement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

time to coalescence

A

the expected time to coalesce for two alleles is geometrically distributed with a mean of E(t) = 2N generations. N is the number of individuals in a generation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

coalescence is slower in…

A

larger populations compared to smaller populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

expected time to coalescence for many alleles

A

4N generations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

properties of coalescent trees in a constant population

A
  • coalescence is rapid with many alleles, decreases over time as n decreases
  • have long trunks and short terminal branches
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

recombination causes

A

different genes in the same individuals to have different gene tree topologies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

recombination in two ways

A
  1. independent assortment of individual chromosomes (the single strand) allows chromosomes to pair with those from the other pair
  2. crossing over between two chromosomes allows parts of the chromosomes to swap locations. how recomb can happen on a single chromosome.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

recombinational gene

A

a block of adjacent nucleotides that share the same gene tree

this is the idea locus for phylogenetic inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

branching or splitting

A

the subdivision of an ancestral population by barriers to gene flow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

population tree

A

a tree that contains many branching populations. all of the gene trees are embedded in this tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

reasons gene trees dont match pop tree

A
  • deep coalescence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

deep coalescence

A

also incomplete lineage sorting
alleles fail to coalesce at their species tree point and instead have coalesced earlier in time, before the ancestral polymorphism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ancestral polymorphism

A

the mutated trait that led to the split of the species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

pop tree length and width

A

length = generations (time)
width = effective population size (idealized population with same coal. props)
combining both:
coalescent unit = 1 unit is 2N the expected time to coalescence

long and narrow, coal more likely
short and wide, coal less likely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

deep coalescence frequency of gene trees

A

that the major (most frequent ) topology of gene trees will match the population tree and the minor (less frequent) topologies are randomly discordant and are equal in frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is a phylogeny debate

A
  • phylogeny as a cloud: phylogeny is a statistical distribution with a central tendency but variance because of the diversity of gene trees that are all included
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

anomaly zone

A

where the population tree does not match the most probably gene tree

in pectinate tree if internal branches are very short, all 4 taxa coalesce before first split, then the 3 symmetric gene tree possibilities are more probable than the pectinate tree gene tree

due to pectinate trees only having one possible coalescence and symmetric has 2 possibilites (in a rooted 4 taxon tree)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

pectinate tree (unbalanced)

A

tree with each taxon being individually sister to the rest of the tree
only one possible sequence of coalescence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

symmetric tree (balanced)

A

the clades split equally
has two possible sequences of coalescence:
- one sister group coalesces and then the other
- vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

anomalous gene tree (AGT)

A

a gene tree that is more probable than the pop tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Total evidence phylogeny estimate approach
concatentation approach combine all of the gene data in a single matrix (therefore all the gene trees) make a single tree from all genes to map the alleles - results well-resolved - assumption: all genes share same history which is unlikely - if pop tree has one or more anomaly zones, this data is wrong - can incorrectly support branches
26
Multi-species coalescent approach
inferring a population tree based on a few genes assumption: all gene tree discordance is due to deep coalescence 3 approaches within: 1. parsimony: minimize deep coalescences 2. full-likelihood co-estimation of pop and gene trees 3. approximate or summary methods
27
Multiple-species coalescent Parsimony
1. count minimum number of deep coalescent events on species tree required to explain gene trees 2. search among candidate species trees to lower the number drawback= no estimates of branch length or width inconsistent when there is anomaly zone
28
Likelihood multiple-species coalescent
given a candidate population tree, what is the likelihood of the gene data measure coalescent times in units of mutations per site per gen (miu) measure populations as mutations per site (theta) theta = 4N miu 2 sequences coalesce at 2/theta with coalescence time theta/2 Pr( G|S) = probability of a gene tree given the population tree
29
Single branch probability depends on parameters
number of alleles exiting and entering that branch, times of coalescent events, branch length and width
30
Probability that a gene evolved in population tree
Pr(Xi|S) = integral of Pr(Gi|S) prob of gene tree embedded in pop tree x Pr(Xi|G) prob of sequence data given gene tree do summary methods instead: unrooted quartets all have same topology (no pectinate/symmetric), large number of gene trees = pop tree topology will have highest probability
31
ASTRAL
takes the gene trees and decomposes them into quartets quartets heuristically reassembled into the optimal population tree
32
Host-associate paradigm in 3 different contexts
trees within a tree 1. gene tree within a species tree 2. parasite cospeciating with its host 3. organisms diverging resulting from geological events the associate tracks the host
33
Paralogy
gene duplication : - one gene starts then is duplicated and each gene copy diverges - genes may be lost resulting in incomplete complements of paralogs may cause a species to show up multiple times in phyl in diff clades, one per paralog if unrecognized, can lead to discordance among gene trees and to pop tree
34
Orthologs
genes in different organisms that serve the same function and are in the same locus inferring phylogeny depends on accurately identifying orthologs, hard when duplications and losses are pervasive
35
Paralogs
genes related by duplication, having different functions and at different loci
36
Horizontal gene transfer
genetic transfer without mating can happen between close or sometimes very distant relatives (common with host and parasite) happens rarely: will result in alternate topology that is very unlikely
37
Introgression
hybridization (diff close species mating) followed by back-crossing (mating with a previous generation) primary concordance tree with a secondary concordance tree and additional very poorly supported trees
38
Hybrid speciation
formation of new species from equal contribution of genes from either parent from different species (closely related) expect 50% of loci of the new species to resolve as sister group to one parent species and the other 50% sister group to the other parent species will result in two primary concordance trees that have equal CF - .50 and .50 all the way down the tree to the common ancestor of the two species that made the hybrid
39
How to assess congruence between host/associate trees?
- are tree topologies more similar than expected by chance - measure congruence in ages of associated clades - likelihood that they evolved on the same tree? - do data matrices with both fail to reject common tree - fit associate tree with host tree, accounting for some incongruence
40
Processes causing incongruence in host/associate trees
- duplication (speciation) and incomplete sorting of associated lineages - host switching or/or host range expansion - unequal and/or different rates of molecular evo in host and assoc. - horizontal gene transfer
41
Reconciliation analysis
parsimony mehtod to find the min number of events causing incongruence between host and associate trees relative costs of duplication, sorting (loss) and host switching taken into account branch length not
42
vicariance
the geographical separation of a population, typically by a physical barrier resulting in two species
43
cladistic or vicariance biogeography
reconstructing geogrpahic history from species cladograms - assumptioin: areas have a treelike history of successive fragmentation events - disjunction: occurence of related taxa is widely separated regions (due to vicariance)
44
Biological species concept
(BSC) an interbreeding community of populations that is reproductively isolated from all other communities by its physical properties (incompatibility of parents, sterility of the hybrids or both) - darwin was not a fan, says fertility could be individual and should not define species
45
Genotypic/phenotypic cluster concept
species are defined by identifying phenotypic or genotypic clusters of individuals who are more similar to each other, less similar to others mallet said in 1990s (going back to darwin): strong support if varieties exist in close proximity for a long time without combining - geographic proximity, genetic clusters (based on close genetic distances), close phylogenetic distances
46
Species defined by pattern
- groups of indivs retain ecological/morph distinctions in sympatry (living very close) - genotypic clusters (being similar to eachother and different from others)
47
Species defined by process
- reproductive isolation -hybridization (not working? sterile?) -natural selection -inherited variation -biological species concept (BSC)
48
Phylogenetic species concept
monophyly of a species taxa, all descendants of a common ancestor at the species level used to be defined as terminal splits that all trees agree upon , Baum 2009 relaxes this by defining exclusivity
49
Exclusivity
That species are exclusive groups : - a set of contemporaneous (existing at same time) organisms that form a clade for more of the genome than any conflicting part of genome (have higher concordance factor CF) -allows for taxa with concordance factors <50%
50
Species as taxa
- products of history/evolution -defined only by the past -"phylogenetic" species concept
51
Species as functional units
- participants in evolution - predictive about the future -trait-based species concept -biological species concept
52
Naming a clade as species is semisubjective
based on semisubjective criteria: - biological significance -utility -predictive power -robustness -precedent
53
Explain my own meaning of the word species
be creative but consistent
54
Causes of discordance
1. incomplete lineage sorting (deep coalescence) 2. duplication and extinction of gene copies (paralogy disruption) 3. gene flow - horizontal gene transfer - hybridization/introgression
55
Reticulate evolution
formation of a species through the partial merging of two ancestor lineages
56
Primary concordance tree
a tree composed of clade with higher concordance factors than any alternative clade (shows up more)
57
Concordance factors
the proportion of the genome for which a given clade is true (shown at the nodes or branches) estimated separate numbers for sample of genes you sequenced and for the genome as a whole
58
How to calculate tree frequencies
learn in OH
59
D-statistic test process
a way to tell if discordance due to incomplete lineage sorting (ILS) or introgression involves measring proportions of two state snps that have abba or baba patterns 1. there is a primary assumed correct topology (p1 and p2 are sister) on 4 taxon pectinate tree 2. create two alternative topologies of the in-group branches - BABA is p1 and p3 are sister -ABBA is p2 and p3 are sister 3. if p(ABBA) = p(BABA) then it is due to ILS, if they are not equal its due to introgression calculation of d-stat from formula, sig by bootstrap
60
D- statistic from a matrix
species are rows and nucleotides are columns read the columns of nucleotides and see which follow the ABBA and BABA patterns then use the d-statistic formula to calculate D
61
D-statistic formula
#ABBA - #BABA / #ABBA + #BABA valus near 0 support ILS, values approaching 1 or -1 support introgression
62
Two classes of characters
discrete and continuous
63
Testing hypotheses of character evo with history vs. models
- History: questions concerning specific ancestors (species evolve before other species) - Models: questions concerning general trends (do bilaterally symmetric flowers evolve from radially symmetric?)
64
Chronicle vs. Narrative
Chronicle: how did a trait evolve Narrative: why did it evolve?
65
Exaptation
a trait whose evolutionary origin has no relation to its current utility (wings are this in penguins, no longer used for flying but for swimming)
66
Steps of testing hypotheses of adaptation on a phylogeny
1. infer the tree (parsimony or L) 2. score the traits of interest (character states must be homologous, share common ancestor) 3. score selective regimes 4. reconstruct history of character changes and of selective regimes 5. assess current utility relative to ancestral state (and performance) - measure fitness difference, compare performance in focal clade to sister group that has same regime and diff state (example: are red flowers an adaptation for bird pollination?) shifts in character and regime is less frequent than branching events
67
Selective regime
all abiotic and biotic factors that determine how natural selection will act on character variation
68
Markov models to test categorical trait evolution
Mk2 models testing the joint evolution of two binary states (character and regime) 1. consider all possible combinations of states in a matrix (4x4) 2. assume: only one trait can change at a time, transition rates with 2 changes = 0 3. have a hypothesis (L1= likelihood of data and tree with hyp) and null (L0) 4. hypothesis is supported if the rate is greater in the way we wanted and the likelihood ratio test is significant 2(logL0 - logL1)
69
High diversification comes from
higher rates of speciation and/or lower rates of extinction
70
2 predictions of competing hypotheses for greater diversity
1. the trait confers "species selection" = higher net diversification 2. if rate of change to state 1 is greater than the reverse = higher diversity
71
BISSE
binary-state speciation-extinction model parameters: q01 or q10 - rates of state change lambda is the rate of speciation miu is the rate of loss
72
Phylogenetic covariance
the time (in branch length) during which two taxa have been evolving together if it is low, the species are pretty independent of eachother, have been evolving separately for longer if high, the species are expected to be similar, cant treat them as independent data points
73
Brownian motion
a stochastic process (occuring randomly within a period of time) where a trait takes lots of random steps - clades can act as traits and drift as a group the traits will land in different places in phylogenetic space depending on when they are split from eachother if they split early, their positions are variable if they split late, expected to be near
74
Phylogenetic independent contrasts
PIC a statistical tool to address dependence of traits reduces N obs to N-1 rescaled contrasts (A*v2 -B*v1) / (v1 + v2) = X1, new value at the node between A and B which is the weighted average of the tips keep doing this process until there is only one X value left. This X = the estimated ancestral state for whole tree under brownian The weights downweigh the taxa on longer branches (expected to be dependent) so they have less influence makes the observations all have means of 0 Results in plots with no correlations
75
A contrast is...
the difference between two tip states (sister states)
76
the normalized contrast
corrects for changes in variance between sets of sister groups based on difference covariances. it does this by taking the contrast divided by the length of the branches combining the two sister taxa.
77
Under brownian motion, the constrast is :
expected to be 0, because the two offspring of the ancestor are expected to vary randomly around the same mean
78
Generalized least squares regressions (GLS)
a method of regression for fitting a model in which data are correlated it allows 1. unequal variances 2. nonindependence has two parameters: a hat = the phylogenetic mean, the ancestral state estimated under brownian σ2 = phylogenetic variance, rate of character evo under brown
79
covariance matrix (C) set up
diagonals have variances non diagonals have covariances also has the effect of downweighting the more covarying data variance can also be calculated and is normalized by C
80
You can also use GLS to test
rate-shift hypotheses at different points in the tree
81
Phylogenetic evidence for viral transfer
being in a common clade or nearby clade
82
Processes that influence shape of virus phylogenies
1. directional selection 2. spatial structure and spread dynamics 3. changes in population size
83
Serial samples, "tip-dated" tree
advantage viral has over normal 1. known dates of samples calibrate clock accurately 2. yields improved estimates of nucleotide substitution rate samples are not all the tip, some are further back towards the root, due to fast generation time
84
Tree balance in viruses
unbalanced: recurrent ongoing selection, rapid spatial spread - selection will drive evolution in a certain clade and cause other clades to drop off balanced: lack of directional selection - no preference for certain clades
85
Spatial structure
structured host pop = phylogenetic clades aligning with geographic position unstructured = all mixed
86
Contemporaneous samples, ultrametric tree
ultrametric = all of the tips align and are the same distance from the root the molecular clock estimates of node ages are increasingly uncertain toward the root contemporaneous = occuring at the same time
87
Exponential inc in pop makes a tree with...
shorter trunks and longer tips coalescence happens faster in a smaller population, the tips will have the largest pop so have the longest branches
88
Constant population size leads to a tree with...
long trunk and short tips the amount of alleles at the tips are the most (going back in time) the ones at the root are only 2. trying to find your one other allele to coalesce with in a big constant population is harder than finding one of many alleles in the same pop.
89
Species diversity
same thing as species richness the count of how many different species are in an area
90
Functional diversity
(trait diversity) how much ecological variation among the species in an area, traits used as a proxy
91
Phylogenetic diversity
how much phylogenetic history is represented by the species in an area measured on a tree by the branch length connecting the species you are considering. the root node may be included or not (likely not), so just connect the species to eachother the PD can vary when species richness is constant bc species will be further or closer on the tree
92
Phylogenetic clustering
that a community (species in area or a niche) is filled by species in the same clade
93
If traits aren't available for diversity...
use phylogenetic diversity as a proxy because closely related species tend to be similar in their traits
94
Phylogenetic overdispersion
that communities are made up over different clades
95
Mean nearest taxon distance (MNTD)
the average distance between each tip in the tree to its closest relative (the whole closest clade) measured from one species to any of the species in the closest descendant clade then average all of those distances to get the MNTD
96
Evolutionary distinctiveness (ED)
how different the species is from everything else, more isolated = higher ED for a single branch: ED = length of branch divided by the number of descendant species coming off of that branch for a species: = the sum of the branch ED scores along the path from the tip to the root
97
Global extinction risk
GE Made by the IUCN, ranking species on their extinction risk based on a few factors
98
EDGE
ED + GE takes into account the uniqueness of the species and the global extinction risk to make one score that can be used for protection
99
Traits tracking habitat or phylogeny more
traits tracking habitat more: -clustered on the phylogeny by their traits -more convergent evolution, traits came from different ancestors traits tracking phylo more: - overdispersed by trait -evolution is conserved, from common ancestors
100
Phylogenetic clustering or overdispersion matrix
conserved, clustering of traits : phylogenetic clustering convergent, clustering: phylogenetic overdispersion conserved, overdispersion: phylo overdispersion convergent, overdispersion: phylo clustering or random dispersion
101
Baum's species criteria
1. biological significance 2. utility 3. predictive power 4. robustness 5. precedent semi-subjective because these criteria might conflict with eachother
102