Phylogenetics Finals Flashcards
Differences between Gene Trees and Species Trees
GT: histories of loci within the genome
ST: reconstructing lineage history
Define Coalescence
Point of common ancestry of 2 alleles
Assumptions of allele coalescence
- Equal probabilities of alleles being passed
- Population size is constant overtime
- Alleles in generation “t” are drawn randomly with replacement from alleles from the previous generation (t-1)
Probability of coalescence
Given N diploid individuals in each generation, the probability that 2 alleles selected randomly in generation coalesce (come from the same parental allele) in the preceding generation t-1 is 1/(2N)
What is the probability that two alleles coalesceded in the previous generation?
1/(2n).
Coalescence occurs at a _______rate when the population is large. Why?
Slower. With more individuals within a population, there is a decrease in the likelihood that you’ll find an allele that matches up with your LCA and the related species’ population.
The expected time to coalescence of 2 alleles
2n.
Why moving from tip to stem, does it take longer than 2N for allele coalescence time?
As populations increase in size, allele frequencies within each individual become more disparate. Therefore, the larger the population, the slower the CT.
What causes coalescent histories of different genes sampled from the same individuals to have different topologies?
Recombination.
Each gene might show a slight deviation from the next due to their lack of identical code.
Genes of different chromosomes are likely to have ________ histories.
Disocordant
What are recombinational genes?
a block of adjacent nucleotides sharing the same gene tree. This is the ideal locus for phylogenetic inference. In practice,
functional genes are assumed to be recombinational genes
Why does deep coalescence kind of ruin species trees?
Because a shared allele (coalescence) typically occurs before an LCA of two species. The allele surfaces first and then carries into the two sister species.
Coalescences are more likely in: (narrow and long) or (short and wide) trees? Why?
Narrow and long. Narrow denotes smaller populations, therefore there is an increased likelihood of shared alleles
Anomaly Zone. What is it?
If both internal branches are sufficiently short, then coalescence is not likely until back near the base node. This means that the population tree does not reflect the gene tree
Approaches to avoid gene tree discordance (there are 4)
- Majority rule consensus
- Concatenate all sequences (assumes all genes have same history)
- Use multi-species coalescent
(assumes discordance is due to deep coalescence) - Bayesian concordance analysis
(assumes genome proportions for which individual clades are true)
Concatentation
laying gene elements end-to-end, creating a supermatrix
How to minimize DC using parsimony
- Count the min # of dp events in a tree
- search among candidate. tree to minimize
Drawback: only topology is estimated
What is a population tree
Branch lengths (time in generations), and branch widths (ancestral population size)
Sequence data in gene trees with branch lengths are measured in ________
mutations per site
Causes of gene tree discordance (3)
- deep coalescence
- paralogy. (gene duplication and loss)
- horizontal gene transfer
What does “trees within trees” mean
Gene trees within population trees
define Orthologs
SAME. genes in different species that share the same locus and function
define Paralogs
DIFFERENT. genes related by duplication. positioned at DIFFERENT loci, different functions
Explain horizontal gene transfer
Transfer effected WITHOUT mating.
What can cause congruences between hosts and parasites (3)
- Duplication (speciation)
- Host switching or range expansion
- unequal heterogenous rates of molecular evolution
How to assess congruence? (5)
- compare tree topologies
- Use independent calibrations of host and parasite molecular clocks
- Measure homogeneity of data
- Does the analysis fail to pick 1 tree?
- Reconciliation analysis (host tree fits with parasite tree)
Define (generally) “Species”
Basic unit of biological diversity
Ernst Mayer emphasized _____ when defining species
Interbreeding populations isolated from other communities
Biological Species concept
populations of organisms connected by gene flow and separated from other species by reproductive barriers
Genotypic concept
Species based on identifying phenotypic or genotypic clusters of individuals (i.e., things more similar to each other, and less to others).
Issues: morphs? sex differences?
Mallet: pattern vs process
Pattern: retain differences in sympatry
Process. Reproduction, hybrids, natural selection
Phylogenetic species concept:
shared derived characters define species
Three species concepts
Biological, Genotypic, phylogenetic
Baum harps on groups as being _______
exclusive. Things that form a clade from more of the genome than any other conflicting set.
Baum. Taxa vs functional units
Taxa: products of history or evolution
Function: trait-based, biological species concept
Define Discordance
variance in the histories tracked by different genes for a set of individuals
Recognizing HGT
gene trees where an organism shows a gene that is not associated with its related group
Mark of hybridization (genetically)
A mix of genetic blocks (about 50/50 distribution)
Introgression
Back crossing from hybrids. The bits of foreign genes from the hybrid get introduced into one of the initial genomes
Primary concordance tree
composed of clades that. have higher concordance factors than any alternative clade
Concordance factors are estimated for:
(there are two)
- A sample of genes
- genome as a whole
How to determine by looking at a figure if it’s ILS or Gene Flow?
if ILS, then primary concordance tree gives the true results, and the remaining values are split between two possibilities. (80% true, then 10%, 10%)
What pattern do we expect with hybridization speciations?
2 co-primary trees
D-Statistics. What are they and why do we use them?
If you have a topology with three in group, and one outgroup, then you can get different possibilities.
D STAT statistic example
C(ABBA)-C(BABA) / C(ABBA)+C(BABA)
D stat does not allow testing for what?
Testing between sister groups
D stats approaching 0 mean ______ and approaching 1 means________
0 - ILS
1 - Introgression
Two classes of characters
Discrete: categorical
Continuous: measurements
Chronicle vs Narrative
Chron: how
Narrative: why
Steps in testing hypo of adaptations on a phylogeny (6)
- infer a tree
- score traits
- score selective regimes
- reconstruct history of character changes
- reconstruct history of selective regimes
- assess current utility
Define selective regime
Abiotic and biotic factors that determine how natural selection will act on a character
Dollo’s Law
concept of irreversibility
Describe ancestral state reconstruction
Tracing back characters (features) through the tree to predict the ancestor’s state of that given trait
Phylogenetic covariance
If traits evolve completely at random, then the expected effect is the phylo tree. Time which two taxa have been evolving together
If phylo covariance is high, then_____
the species will be very similar
In Brownian motion, where would you expect two branches to eventually end up?
Where they split. Closer=higher variance
Define contrasts
Difference between two tip states. First between sisters. Expected value of 0
Normalized contrast
Divided by the two lengths of the branches they are on. EXAMPLE (A-B)/(v1+v2)
Weighted Averaging
X1 = (Av2 + Bv1)/(v1+v2)
Know what inherent statistical issues there are with contrasts, and how normalized contrasts solves it
Normalizing helps reduce the greater covariance between species with deeper ancestral nodes
GLS allows for _____ and ______ unlike ordinary least squares
- Unequal variances
- Nonindependance
Brown motioning assumes.
1. Close individuals have ___trait values
2. high variance between _______related taxa
- similar
- distantly
Know how to test a rate shift hypothesis
What do indépendant contrasts accomplish? (3)
- reduces autocorrelated results
- Normalization reduces distances between taxa
- Reconstruct the ancestral state
What evidence could there be that the dentist passed HIV to his patients? (forensic epidemiology)
All strains contained within a single clade
Processes that influence the shape of virus phylogenies: (3)
1.directional selection
2. spatial structure
3. changes in population size
What does a balanced tree indicate?
A lack of directional selection (weak selection)
What do the following entail in the context of viral population trees?
- Long trunk, short tips
- Short trunk, long tips
- constant pop size
- Exponentially increasing pop size
Species diversity
measures how many species in a given area
Functional diversity
ecological variation present among the species in an area (using traits as a proxy for function)
Phylogenetic diversity
how much phylogenetic history is represented by the species present
Phylogenetic Cluster vs Overdispersion
Cluster: traits are confined to clades
Over: traits observed are scattered across the tree
Phylo clustering is treated as evidence for…
ecological filtering
phylo over dispersion is treated as evidence for…..
competitive exclusion. (closely related species are not in the same habitats since they would battle against each other)
How we can calculate phylogenetic diversity
total length of the minimum spanning path connecting a subset of species on a tree (so total branches present). Only in the species present
What is evolutionary distinctiveness?
ED used for single species or a branch. For a single branch: length of the branch divided by the number of descendants.
For single species: sum of branch scores along the path to the root
EDGE. what is it?
ED + GE (Global extinction risk)