Phylogenetics Final Flashcards

1
Q

systematics

A

the inference of phylogenies, the genealogy of species, focus on the species tree (reconstructing lineage history)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

coalescence

A

point of common ancestry of two alleles
they both come from the same parental allele

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

lines of descent in diploid sex pop

A

the most recent at the top and goes back in time at the bottom
the graph is a section of the genome
the pairs a individual genes
the circles are alleles of the gene

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assumptions of diploid sex pop allele descent

A
  • equal probabilities of alleles being passed from one gen to the next (no selection, random mating)
  • population size is constant over time
  • alleles (from gen t) are drawn randomly with replacement from the previous generation (t-1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

probability of coalescence

A

given N diploid individuals in each generation, the probability that 2 alleles coalesce in the previous generation (t-1) is 1/2N.
it is just the random probability with replacement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

time to coalescence

A

the expected time to coalesce for two alleles is geometrically distributed with a mean of E(t) = 2N generations. N is the number of individuals in a generation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

coalescence is slower in…

A

larger populations compared to smaller populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

expected time to coalescence for many alleles

A

4N generations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

properties of coalescent trees in a constant population

A
  • coalescence is rapid with many alleles, decreases over time as n decreases
  • have long trunks and short terminal branches
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

recombination causes

A

different genes in the same individuals to have different gene tree topologies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

recombination in two ways

A
  1. independent assortment of individual chromosomes (the single strand) allows chromosomes to pair with those from the other pair
  2. crossing over between two chromosomes allows parts of the chromosomes to swap locations. how recomb can happen on a single chromosome.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

recombinational gene

A

a block of adjacent nucleotides that share the same gene tree

this is the idea locus for phylogenetic inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

branching or splitting

A

the subdivision of an ancestral population by barriers to gene flow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

population tree

A

a tree that contains many branching populations. all of the gene trees are embedded in this tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

reasons gene trees dont match pop tree

A
  • deep coalescence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

deep coalescence

A

also incomplete lineage sorting
alleles fail to coalesce at their species tree point and instead have coalesced earlier in time, before the ancestral polymorphism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ancestral polymorphism

A

the mutated trait that led to the split of the species

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

pop tree length and width

A

length = generations (time)
width = effective population size (idealized population with same coal. props)
combining both:
coalescent unit = 1 unit is 2N the expected time to coalescence

long and narrow, coal more likely
short and wide, coal less likely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

deep coalescence frequency of gene trees

A

that the major (most frequent ) topology of gene trees will match the population tree and the minor (less frequent) topologies are randomly discordant and are equal in frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is a phylogeny debate

A
  • phylogeny as a cloud: phylogeny is a statistical distribution with a central tendency but variance because of the diversity of gene trees that are all included
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

anomaly zone

A

where the population tree does not match the most probably gene tree

in pectinate tree if internal branches are very short, all 4 taxa coalesce before first split, then the 3 symmetric gene tree possibilities are more probable than the pectinate tree gene tree

due to pectinate trees only having one possible coalescence and symmetric has 2 possibilites (in a rooted 4 taxon tree)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

pectinate tree (unbalanced)

A

tree with each taxon being individually sister to the rest of the tree
only one possible sequence of coalescence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

symmetric tree (balanced)

A

the clades split equally
has two possible sequences of coalescence:
- one sister group coalesces and then the other
- vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

anomalous gene tree (AGT)

A

a gene tree that is more probable than the pop tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Total evidence phylogeny estimate approach

A

concatentation approach
combine all of the gene data in a single matrix (therefore all the gene trees)
make a single tree from all genes to map the alleles
- results well-resolved
- assumption: all genes share same history which is unlikely
- if pop tree has one or more anomaly zones, this data is wrong
- can incorrectly support branches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Multi-species coalescent approach

A

inferring a population tree based on a few genes
assumption: all gene tree discordance is due to deep coalescence
3 approaches within:
1. parsimony: minimize deep coalescences
2. full-likelihood co-estimation of pop and gene trees
3. approximate or summary methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Multiple-species coalescent Parsimony

A
  1. count minimum number of deep coalescent events on species tree required to explain gene trees
  2. search among candidate species trees to lower the number

drawback= no estimates of branch length or width
inconsistent when there is anomaly zone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Likelihood multiple-species coalescent

A

given a candidate population tree, what is the likelihood of the gene data

measure coalescent times in units of mutations per site per gen (miu)
measure populations as mutations per site (theta)
theta = 4N miu

2 sequences coalesce at 2/theta with coalescence time theta/2

Pr( G|S) = probability of a gene tree given the population tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Single branch probability depends on parameters

A

number of alleles exiting and entering that branch, times of coalescent events, branch length and width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Probability that a gene evolved in population tree

A

Pr(Xi|S) = integral of Pr(Gi|S) prob of gene tree embedded in pop tree x Pr(Xi|G) prob of sequence data given gene tree

do summary methods instead:
unrooted quartets all have same topology (no pectinate/symmetric), large number of gene trees = pop tree topology will have highest probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

ASTRAL

A

takes the gene trees and decomposes them into quartets
quartets heuristically reassembled into the optimal population tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Host-associate paradigm in 3 different contexts

A

trees within a tree
1. gene tree within a species tree
2. parasite cospeciating with its host
3. organisms diverging resulting from geological events
the associate tracks the host

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Paralogy

A

gene duplication :
- one gene starts then is duplicated and each gene copy diverges
- genes may be lost resulting in incomplete complements of paralogs

may cause a species to show up multiple times in phyl in diff clades, one per paralog

if unrecognized, can lead to discordance among gene trees and to pop tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Orthologs

A

genes in different organisms that serve the same function and are in the same locus
inferring phylogeny depends on accurately identifying orthologs, hard when duplications and losses are pervasive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Paralogs

A

genes related by duplication, having different functions and at different loci

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Horizontal gene transfer

A

genetic transfer without mating
can happen between close or sometimes very distant relatives
(common with host and parasite)

happens rarely: will result in alternate topology that is very unlikely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Introgression

A

hybridization (diff close species mating) followed by back-crossing (mating with a previous generation)

primary concordance tree with a secondary concordance tree and additional very poorly supported trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Hybrid speciation

A

formation of new species from equal contribution of genes from either parent from different species (closely related)

expect 50% of loci of the new species to resolve as sister group to one parent species and the other 50% sister group to the other parent species

will result in two primary concordance trees that have equal CF
- .50 and .50 all the way down the tree to the common ancestor of the two species that made the hybrid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How to assess congruence between host/associate trees?

A
  • are tree topologies more similar than expected by chance
  • measure congruence in ages of associated clades
  • likelihood that they evolved on the same tree?
  • do data matrices with both fail to reject common tree
  • fit associate tree with host tree, accounting for some incongruence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Processes causing incongruence in host/associate trees

A
  • duplication (speciation) and incomplete sorting of associated lineages
  • host switching or/or host range expansion
  • unequal and/or different rates of molecular evo in host and assoc.
  • horizontal gene transfer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Reconciliation analysis

A

parsimony mehtod to find the min number of events causing incongruence between host and associate trees
relative costs of duplication, sorting (loss) and host switching taken into account
branch length not

42
Q

vicariance

A

the geographical separation of a population, typically by a physical barrier resulting in two species

43
Q

cladistic or vicariance biogeography

A

reconstructing geogrpahic history from species cladograms
- assumptioin: areas have a treelike history of successive fragmentation events
- disjunction: occurence of related taxa is widely separated regions (due to vicariance)

44
Q

Biological species concept

A

(BSC)
an interbreeding community of populations that is reproductively isolated from all other communities by its physical properties (incompatibility of parents, sterility of the hybrids or both)
- darwin was not a fan, says fertility could be individual and should not define species

45
Q

Genotypic/phenotypic cluster concept

A

species are defined by identifying phenotypic or genotypic clusters of individuals who are more similar to each other, less similar to others

mallet said in 1990s (going back to darwin):
strong support if varieties exist in close proximity for a long time without combining
- geographic proximity, genetic clusters (based on close genetic distances), close phylogenetic distances

46
Q

Species defined by pattern

A
  • groups of indivs retain ecological/morph distinctions in sympatry (living very close)
  • genotypic clusters (being similar to eachother and different from others)
47
Q

Species defined by process

A
  • reproductive isolation
    -hybridization (not working? sterile?)
    -natural selection
    -inherited variation
    -biological species concept (BSC)
48
Q

Phylogenetic species concept

A

monophyly of a species taxa, all descendants of a common ancestor at the species level

used to be defined as terminal splits that all trees agree upon , Baum 2009 relaxes this by defining exclusivity

49
Q

Exclusivity

A

That species are exclusive groups :
- a set of contemporaneous (existing at same time) organisms that form a clade for more of the genome than any conflicting part of genome (have higher concordance factor CF)
-allows for taxa with concordance factors <50%

50
Q

Species as taxa

A
  • products of history/evolution
    -defined only by the past
    -“phylogenetic” species concept
51
Q

Species as functional units

A
  • participants in evolution
  • predictive about the future
    -trait-based species concept
    -biological species concept
52
Q

Naming a clade as species is semisubjective

A

based on semisubjective criteria:
- biological significance
-utility
-predictive power
-robustness
-precedent

53
Q

Explain my own meaning of the word species

A

be creative but consistent

54
Q

Causes of discordance

A
  1. incomplete lineage sorting (deep coalescence)
  2. duplication and extinction of gene copies (paralogy disruption)
  3. gene flow
    - horizontal gene transfer
    - hybridization/introgression
55
Q

Reticulate evolution

A

formation of a species through the partial merging of two ancestor lineages

56
Q

Primary concordance tree

A

a tree composed of clade with higher concordance factors than any alternative clade (shows up more)

57
Q

Concordance factors

A

the proportion of the genome for which a given clade is true (shown at the nodes or branches)

estimated separate numbers for sample of genes you sequenced and for the genome as a whole

58
Q

How to calculate tree frequencies

A

learn in OH

59
Q

D-statistic test process

A

a way to tell if discordance due to incomplete lineage sorting (ILS) or introgression
involves measring proportions of two state snps that have abba or baba patterns

  1. there is a primary assumed correct topology (p1 and p2 are sister) on 4 taxon pectinate tree
  2. create two alternative topologies of the in-group branches
    - BABA is p1 and p3 are sister
    -ABBA is p2 and p3 are sister
  3. if p(ABBA) = p(BABA) then it is due to ILS, if they are not equal its due to introgression

calculation of d-stat from formula, sig by bootstrap

60
Q

D- statistic from a matrix

A

species are rows and nucleotides are columns
read the columns of nucleotides and see which follow the ABBA and BABA patterns
then use the d-statistic formula to calculate D

61
Q

D-statistic formula

A

ABBA - #BABA / #ABBA + #BABA

valus near 0 support ILS, values approaching 1 or -1 support introgression

62
Q

Two classes of characters

A

discrete and continuous

63
Q

Testing hypotheses of character evo with history vs. models

A
  • History: questions concerning specific ancestors (species evolve before other species)
  • Models: questions concerning general trends (do bilaterally symmetric flowers evolve from radially symmetric?)
64
Q

Chronicle vs. Narrative

A

Chronicle: how did a trait evolve
Narrative: why did it evolve?

65
Q

Exaptation

A

a trait whose evolutionary origin has no relation to its current utility
(wings are this in penguins, no longer used for flying but for swimming)

66
Q

Steps of testing hypotheses of adaptation on a phylogeny

A
  1. infer the tree (parsimony or L)
  2. score the traits of interest (character states must be homologous, share common ancestor)
  3. score selective regimes
  4. reconstruct history of character changes and of selective regimes
  5. assess current utility relative to ancestral state (and performance) - measure fitness difference, compare performance in focal clade to sister group that has same regime and diff state

(example: are red flowers an adaptation for bird pollination?)

shifts in character and regime is less frequent than branching events

67
Q

Selective regime

A

all abiotic and biotic factors that determine how natural selection will act on character variation

68
Q

Markov models to test categorical trait evolution

A

Mk2 models
testing the joint evolution of two binary states (character and regime)

  1. consider all possible combinations of states in a matrix (4x4)
  2. assume: only one trait can change at a time, transition rates with 2 changes = 0
  3. have a hypothesis (L1= likelihood of data and tree with hyp) and null (L0)
  4. hypothesis is supported if the rate is greater in the way we wanted and the likelihood ratio test is significant 2(logL0 - logL1)
69
Q

High diversification comes from

A

higher rates of speciation and/or lower rates of extinction

70
Q

2 predictions of competing hypotheses for greater diversity

A
  1. the trait confers “species selection” = higher net diversification
  2. if rate of change to state 1 is greater than the reverse = higher diversity
71
Q

BISSE

A

binary-state speciation-extinction model
parameters:
q01 or q10 - rates of state change
lambda is the rate of speciation
miu is the rate of loss

72
Q

Phylogenetic covariance

A

the time (in branch length) during which two taxa have been evolving together

if it is low, the species are pretty independent of eachother, have been evolving separately for longer

if high, the species are expected to be similar, cant treat them as independent data points

73
Q

Brownian motion

A

a stochastic process (occuring randomly within a period of time) where a trait takes lots of random steps
- clades can act as traits and drift as a group

the traits will land in different places in phylogenetic space depending on when they are split from eachother

if they split early, their positions are variable
if they split late, expected to be near

74
Q

Phylogenetic independent contrasts

A

PIC
a statistical tool to address dependence of traits
reduces N obs to N-1 rescaled contrasts

(Av2 -Bv1) / (v1 + v2) = X1, new value at the node between A and B which is the weighted average of the tips
keep doing this process until there is only one X value left.
This X = the estimated ancestral state for whole tree under brownian

The weights downweigh the taxa on longer branches (expected to be dependent) so they have less influence
makes the observations all have means of 0
Results in plots with no correlations

75
Q

A contrast is…

A

the difference between two tip states (sister states)

76
Q

the normalized contrast

A

corrects for changes in variance between sets of sister groups based on difference covariances.
it does this by taking the contrast divided by the length of the branches combining the two sister taxa.

77
Q

Under brownian motion, the constrast is :

A

expected to be 0, because the two offspring of the ancestor are expected to vary randomly around the same mean

78
Q

Generalized least squares regressions (GLS)

A

a method of regression for fitting a model in which data are correlated
it allows
1. unequal variances
2. nonindependence
has two parameters:
a hat = the phylogenetic mean, the ancestral state estimated under brownian
σ2 = phylogenetic variance, rate of character evo under brown

79
Q

covariance matrix (C) set up

A

diagonals have variances
non diagonals have covariances

also has the effect of downweighting the more covarying data

variance can also be calculated and is normalized by C

80
Q

You can also use GLS to test

A

rate-shift hypotheses at different points in the tree

81
Q

Phylogenetic evidence for viral transfer

A

being in a common clade or nearby clade

82
Q

Processes that influence shape of virus phylogenies

A
  1. directional selection
  2. spatial structure and spread dynamics
  3. changes in population size
83
Q

Serial samples, “tip-dated” tree

A

advantage viral has over normal
1. known dates of samples calibrate clock accurately
2. yields improved estimates of nucleotide substitution rate

samples are not all the tip, some are further back towards the root, due to fast generation time

84
Q

Tree balance in viruses

A

unbalanced: recurrent ongoing selection, rapid spatial spread
- selection will drive evolution in a certain clade and cause other clades to drop off

balanced: lack of directional selection
- no preference for certain clades

85
Q

Spatial structure

A

structured host pop = phylogenetic clades aligning with geographic position
unstructured = all mixed

86
Q

Contemporaneous samples, ultrametric tree

A

ultrametric = all of the tips align and are the same distance from the root
the molecular clock estimates of node ages are increasingly uncertain toward the root
contemporaneous = occuring at the same time

87
Q

Exponential inc in pop makes a tree with…

A

shorter trunks and longer tips

coalescence happens faster in a smaller population, the tips will have the largest pop so have the longest branches

88
Q

Constant population size leads to a tree with…

A

long trunk and short tips

the amount of alleles at the tips are the most (going back in time) the ones at the root are only 2. trying to find your one other allele to coalesce with in a big constant population is harder than finding one of many alleles in the same pop.

89
Q

Species diversity

A

same thing as species richness

the count of how many different species are in an area

90
Q

Functional diversity

A

(trait diversity)
how much ecological variation among the species in an area, traits used as a proxy

91
Q

Phylogenetic diversity

A

how much phylogenetic history is represented by the species in an area

measured on a tree by the branch length connecting the species you are considering.
the root node may be included or not (likely not), so just connect the species to eachother

the PD can vary when species richness is constant bc species will be further or closer on the tree

92
Q

Phylogenetic clustering

A

that a community (species in area or a niche) is filled by species in the same clade

93
Q

If traits aren’t available for diversity…

A

use phylogenetic diversity as a proxy

because closely related species tend to be similar in their traits

94
Q

Phylogenetic overdispersion

A

that communities are made up over different clades

95
Q

Mean nearest taxon distance (MNTD)

A

the average distance between each tip in the tree to its closest relative (the whole closest clade)

measured from one species to any of the species in the closest descendant clade

then average all of those distances to get the MNTD

96
Q

Evolutionary distinctiveness (ED)

A

how different the species is from everything else, more isolated = higher ED

for a single branch: ED = length of branch divided by the number of descendant species coming off of that branch

for a species: = the sum of the branch ED scores along the path from the tip to the root

97
Q

Global extinction risk

A

GE
Made by the IUCN, ranking species on their extinction risk based on a few factors

98
Q

EDGE

A

ED + GE
takes into account the uniqueness of the species and the global extinction risk to make one score that can be used for protection

99
Q

Traits tracking habitat or phylogeny more

A

traits tracking habitat more:
-clustered on the phylogeny by their traits
-more convergent evolution, traits came from different ancestors

traits tracking phylo more:
- overdispersed by trait
-evolution is conserved, from common ancestors

100
Q

Phylogenetic clustering or overdispersion matrix

A

conserved, clustering of traits : phylogenetic clustering

convergent, clustering: phylogenetic overdispersion

conserved, overdispersion:
phylo overdispersion

convergent, overdispersion: phylo clustering or random dispersion

101
Q

Baum’s species criteria

A
  1. biological significance
  2. utility
  3. predictive power
  4. robustness
  5. precedent

semi-subjective because these criteria might conflict with eachother

102
Q
A