Phylogeography Flashcards
Why might you use a network rather than a tree?
Phylogenetic methods assume all parts of the sequence have the same evolutionary history, and that there is only one correct topology - but this is not always true, e.g., when:
Reticulate evolution:
- Recombination between sequences or hybridization between species
Ambiguous data:
- May not be clear which pathways are correct
- Character conflicts - conflicting phylogenetic signals at different character sites
What and how do networks display sequences?
Networks give visual representation of complex data
- Provide haplotype frequency information - size of taxon circles
- Orientation of network is irrelevant
- Lines indicate links between haplotypes via a particular mutational pathway, cross lines indicate the number of mutations along the connection
- Networks summarise many of the most likely pathways in a single diagram
- Network includes loops or ‘reticulations’ (cycles) - which display alternative evolutionary pathways
What are some of the possible reasons for the reticulations/conflicts?
- Homoplasy
- Recombination
- Sequence errors
- Contamination (more than one sequence)
What different network algorithms are used?
- Distance-preserving networks (compatibility networks, or split diagrams)
-
Median and reduced-median (RM) networks
-Median-joining (MJ) and minimum-spanning networks
How do Median networks work?
- Assumption: all variable sites are single nucleotide polymorphisms (SNPs) - i.e. nucleotide state can only be 1 or 2 variants
- Input: binary character data
- Frequency of sequence types also used
- Uses parsimony to calculate networks
- Median-joining networks can accept multi-state data
- Median network is made by calculating ‘median vectors’ for all triplets of taxa in the data - i.e. explicitly reconstruct the likely ancestral haplotype in each case
- Median can be thought of as the overall consensus sequence of the binary haplotypes in the triplet. If it is not present in the observed data, it is added into the network at the appropriate location
What are reduced-median networks and what factors are used for them?
- Median networks include all maximum parsimony (MP) trees and many sub-optimal (perhaps correct) trees
- So - network reduction removes some of the less likely pathways - by pinpointing obvious parallel mutations
Factors: - Weight of the character - i.e. number of mutations supporting it, possibly taking into account their rates - as in MP
- Frequency of haplotypes - older haplotypes in populations tend to be more common
- If you dont have any of this additional information - is difficult to simplify trees
How was this haplotype network used for mosquitos in the Galapagos?
Network showed:
- Each of the hypothesised mutational events, showing order of mutations and branches proportional to the weight of the number of mutations (additive)
- Frequency of each haplotype
- Hypothesised ancestral haplotypes (nodes)
- Found that there are many mutations between mosquitos on the Galapagos to on the Americas - and therefore they are likely native to Galapagos
- Can then use mutation rate to identify the divergence time between Galapagos haplotypes and continental haplotypes - likely to be 200-250,000 years ago - well outside range of a possible human associated process
What do networks allow us to do?
- Indicate plausible haplogroups
- Predict ancestral haplotypes
- Highlight regions where homoplasy has occured
- Showed which sites mutated frequently
- Show the location of a consensus sequence
- Show whether recombination may have occurred
- Point to sequencing errors
How can we use networks to understand more about spatio/temporal patterns of evolutions in relation to the sequence data?
4 components:
1. Gene tree/network - relationships between groups/pops
2. Diversity of clusters of lineages - can be converted to time depth
3. Geographic distribution of lineages - patterns of historical dispersal and geographic origins
4. Hypothesis testing - supportted by archaeological, paleoenvironmental data
Why is mtDNA commonly used for phylogeographic studies?
- Lack of recombination
- HVS-1 was used a lot - but now it has become quicker to use complete sequence - which gives higher resolution and more precise age estimates with a lower number of samples
Where is the root of mtDNA?
- Root in africa
- These diversify in Eurasia
- Regionally specific mtDNA haplogroups evolve
- Coalescence time ~200,000 years
- Time of founder effect: ~60-80,000 years
How is mtDNA haplogroup B4 distributed in Southeast Asia and the Pacific?
- B4 haplogroup is only found to the east of the Wallace Line - Polynesian Motif
How can use estimate time depth and divergence time with phylogenetics - ‘molecular clock’?
- Assume that most new mutations are neutral
Model-free:
- Incorporates tree/network topology
- Based on sequence evolution model
- Applies molecular clock (expected number of mutations per unit time)
- Time depth of branches or age of common ancestor of clades estimated from number of accumulated mutations
Model-based:
- Incorporates tree/network topology + sequence evolution model
- Incorporates mutation rates
- In addition fits coalescent models for different demographic scenarios - e.g., constant sized v. population expansions at different time points
- Diff demographic scenarios will give different coalescent predictions for branch length
- Test which scenarios are most compatable with observed data
What are the two hypotheses for Polynesia?
- ‘Out of Taiwan’ - agricultural dispersal of proto-Austronesians from South China/Taiwan from ~4500 ka
- Post glacial - Sea-level rise: development of maritime technology in the ‘voyaging corridor’ as the coastlines became more accessible ~6 ka - more fitting with observed data
What did the dating of the origins of dog domestication tell us?
Evidence suggests that most comtempary domestic dog populations are related to domestication events that involved Eurasian canids
- Thalman et al., 2013