Phylogeography Flashcards

1
Q

Why might you use a network rather than a tree?

A

Phylogenetic methods assume all parts of the sequence have the same evolutionary history, and that there is only one correct topology - but this is not always true, e.g., when:

Reticulate evolution:
- Recombination between sequences or hybridization between species
Ambiguous data:
- May not be clear which pathways are correct
- Character conflicts - conflicting phylogenetic signals at different character sites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What and how do networks display sequences?

A

Networks give visual representation of complex data
- Provide haplotype frequency information - size of taxon circles
- Orientation of network is irrelevant
- Lines indicate links between haplotypes via a particular mutational pathway, cross lines indicate the number of mutations along the connection
- Networks summarise many of the most likely pathways in a single diagram
- Network includes loops or ‘reticulations’ (cycles) - which display alternative evolutionary pathways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some of the possible reasons for the reticulations/conflicts?

A
  • Homoplasy
  • Recombination
  • Sequence errors
  • Contamination (more than one sequence)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What different network algorithms are used?

A
  • Distance-preserving networks (compatibility networks, or split diagrams)
  • Median and reduced-median (RM) networks
    -Median-joining (MJ) and minimum-spanning networks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do Median networks work?

A
  • Assumption: all variable sites are single nucleotide polymorphisms (SNPs) - i.e. nucleotide state can only be 1 or 2 variants
  • Input: binary character data
  • Frequency of sequence types also used
  • Uses parsimony to calculate networks
  • Median-joining networks can accept multi-state data
  • Median network is made by calculating ‘median vectors’ for all triplets of taxa in the data - i.e. explicitly reconstruct the likely ancestral haplotype in each case
  • Median can be thought of as the overall consensus sequence of the binary haplotypes in the triplet. If it is not present in the observed data, it is added into the network at the appropriate location
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are reduced-median networks and what factors are used for them?

A
  • Median networks include all maximum parsimony (MP) trees and many sub-optimal (perhaps correct) trees
  • So - network reduction removes some of the less likely pathways - by pinpointing obvious parallel mutations
    Factors:
  • Weight of the character - i.e. number of mutations supporting it, possibly taking into account their rates - as in MP
  • Frequency of haplotypes - older haplotypes in populations tend to be more common
  • If you dont have any of this additional information - is difficult to simplify trees
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How was this haplotype network used for mosquitos in the Galapagos?

A

Network showed:
- Each of the hypothesised mutational events, showing order of mutations and branches proportional to the weight of the number of mutations (additive)
- Frequency of each haplotype
- Hypothesised ancestral haplotypes (nodes)
- Found that there are many mutations between mosquitos on the Galapagos to on the Americas - and therefore they are likely native to Galapagos
- Can then use mutation rate to identify the divergence time between Galapagos haplotypes and continental haplotypes - likely to be 200-250,000 years ago - well outside range of a possible human associated process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do networks allow us to do?

A
  • Indicate plausible haplogroups
  • Predict ancestral haplotypes
  • Highlight regions where homoplasy has occured
  • Showed which sites mutated frequently
  • Show the location of a consensus sequence
  • Show whether recombination may have occurred
  • Point to sequencing errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we use networks to understand more about spatio/temporal patterns of evolutions in relation to the sequence data?

A

4 components:
1. Gene tree/network - relationships between groups/pops
2. Diversity of clusters of lineages - can be converted to time depth
3. Geographic distribution of lineages - patterns of historical dispersal and geographic origins
4. Hypothesis testing - supportted by archaeological, paleoenvironmental data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is mtDNA commonly used for phylogeographic studies?

A
  • Lack of recombination
  • HVS-1 was used a lot - but now it has become quicker to use complete sequence - which gives higher resolution and more precise age estimates with a lower number of samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Where is the root of mtDNA?

A
  • Root in africa
  • These diversify in Eurasia
  • Regionally specific mtDNA haplogroups evolve
  • Coalescence time ~200,000 years
  • Time of founder effect: ~60-80,000 years
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is mtDNA haplogroup B4 distributed in Southeast Asia and the Pacific?

A
  • B4 haplogroup is only found to the east of the Wallace Line - Polynesian Motif
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can use estimate time depth and divergence time with phylogenetics - ‘molecular clock’?

A
  • Assume that most new mutations are neutral

Model-free:
- Incorporates tree/network topology
- Based on sequence evolution model
- Applies molecular clock (expected number of mutations per unit time)
- Time depth of branches or age of common ancestor of clades estimated from number of accumulated mutations

Model-based:
- Incorporates tree/network topology + sequence evolution model
- Incorporates mutation rates
- In addition fits coalescent models for different demographic scenarios - e.g., constant sized v. population expansions at different time points
- Diff demographic scenarios will give different coalescent predictions for branch length
- Test which scenarios are most compatable with observed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the two hypotheses for Polynesia?

A
  • ‘Out of Taiwan’ - agricultural dispersal of proto-Austronesians from South China/Taiwan from ~4500 ka
  • Post glacial - Sea-level rise: development of maritime technology in the ‘voyaging corridor’ as the coastlines became more accessible ~6 ka - more fitting with observed data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What did the dating of the origins of dog domestication tell us?

A

Evidence suggests that most comtempary domestic dog populations are related to domestication events that involved Eurasian canids
- Thalman et al., 2013

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What was found about paleoenvironmental processes leaving signatures via Snails in queensland?

A
  • Snail populations became isolated in the refusia
  • When rainforest expanded again, the snails expanded out of that
  • Pattern of isolation in refusia left deep inprint on structure of the genetic lineages in contempary populations - illustrated by mapping of structure of the tree onto the distribution of the refusia
17
Q

Give two examples of how phyloegraphy can be used for epidemiological history of epidemics

A

HIV history
- HIV had origins before 1920 in chimpanzee hunters in the Cameroon, before amplification in Kinshasa
- Around 1960 - rail links promoted the spread of the virus to mining areas in southeastern Congo and beyond

COVID-19:
- Realtime genetic epidemiology information and trace patterns of introduction and transmission events