Lecture 10 Flashcards
The coalescence model can be derived as —— from several ——.
a limiting distribution, population genetic models.
What is the common assumption in the coalescent models ?
That the underlying population dynamics are deterministic, unlike birth death process
What are Coalescent models often used for?
They are often used as the basis for phylodynamic inference of population size and dynamics.
In the wright fisher process generations are —-.
discrete
Each generation in the WF process consists of ——.
N individuals
How do the individuals in the offspring population choose their parent?
uniformly at random from N parents.
How are the number of offspring of a parent distributed?
Binomially
If there are copies of a gene present in an individual, how can we account for this?
Ploidy can be taken into account by multiplying N by a factor which accounts for the number of copies of the gene present
For a diploid organism, the number of copies of a gene in the population is —-?
2N
Generations between internal nodes are related to —-.
population size
When we sample a 2 individual phylogeny, what is the main question that we’re asking?
What is th probability of the individuals sharing a parent?
P_coal for two individuals = ?
1*1/N
What is the probability of coalescence in generation i - m?
p(m) = (1-p_coal)^(m-1) * p_coal = (1-1/N)^m-1*1/N. We keep trying for each generation—> no coalescence –>the first term, finally there is a coalesence –> 2nd term
For large m, the probablity distiburiton becomes —-.
exponential
If g is the calendar time of a generation, what is the calendar time span of m generations?
g*m
What is the probablity density function for the coalesce time of two lineages?
1/gN * e^(-dt/gN)
What is the average time to go back to find a common ancestor?
g*N
In the large N limit, the time to coalescence is ?
exponentially distributed with mean gN
How could we generalise p_coal for k samples?
k choose 2 *1/N
Kingsman’s coalescent, uses —— time —— which produces —–.
continuous, Markov process, sampled time trees
In KC, process occurs —– in time.
backwards
In what sense is KC a Markov process?
if we start the tree simulation and move back in time, the only thing we need to know is the number of extant lineages and nothing else about the history (memorylessness aspect)
How could KC be equivalent to WF?
It is equivalent to sampled trees produced by WF model when N is much larger than the number os samples
Times between the coalesence events are drawn from —- distribution with rate parameter —-.
exponential, (k choose 2)*1/Ng
Under the coalescent model the average time required for n lineages to coalescence into 1 is —–, which can be simplified to —- for larger n. n is —–.
slide 17, 2Ng, number of leaves in the coalescent tree.
What is the probability of a coalescence tree given Ng?
side 17
What does each term represent in the coalescent tree probablity?
The nested term, what is the probablity that we go the entire duration without any common ancestor being found( k choose 2 being due to having k lineages, dt the duration in time , the other term is the probablity density that a particular part of lineage coalesce.
Why would results from a coalesce model be different from a real tree?
1- the results could be biased since the real dynamics differ from WF dynamics
2- real population sizes are structured, where the WF population is assumed to be completely homogenous and that if we pick any individual, they have an equal chance of sharing a parent which is obv not true for many populations
Inferred population size is referred to as the —-?
effective population size
Inferred population size is the size of a —– populations. does this share any similarity with the real population?
WF, yes, some statistical similarity, however care should be taken when drawing conclusion from the effective population sizes
The coalescent distribution is derived as —– of the WF process.
limit
What other population process does WF appear as a limit as? How is the limit applied?
The canning model : generalisation of WF
The Moran model : overlapping generations, fixed population size
Stochastic logistic models :continuous time, population fluctuations
What is the robustness of coalescent?
Coalescent distribution persists in the face of many departures from WF model is sometimes termed as robustness of coalescent.
What are the 3 general assumption of coalescent?
1- samples are members of a population that is at demographic equilibrium.
2- Number of samples is small compared to the total population size
3- population is well mixed and samples are drawn uniformly at random.
What does each assumption of coalescent justify?
demographic equilibrium : use of fixed or slowly varying population size
small sample size compared to total population size: neglect of >2 lineages coalescing in the same generation
well mixed: the coalescent rate between any pair of sample lineages being equal. population structure violates this assumption.
Probability of a phylogenetic tree can be calculated via?
the rate of coalescence, so 1/ N(t), meaning the inverse of population as a function of time.
When N(t) is larger we have a —- coalescence rate and —– branches.
slower, longer
In paprmetric population dynamics inference rate of exponential distribution can/cant change through time, this is accounted for by having an —- in the function.
can, integral
Parametric population dynamic inference, leads to a likelihood that —–, therefore we can —- different — scenarios for a given —-.
Incorporates the growth rate term of population size N(t). compare and test, demographic, tree.
How does non parametric population dynamics inference work?
Assume a population which has distinct constant values in each interval between coalescent events. we can obtain a separate ML estimate for each population size. Resulting population function estimate is the skyline plot.
Can we develop coalescent distributions which approximate the probability density of sampled phylogenies generated by birth-death process? if yes, how?
yes.
1- Assume the ODE solution in slide 30 including the population size at present for linear birth death process is correct. Birth occurs at time t with overall rate BI(t).
2- Every brith is a potential coalesce between sampled lineages
3- probablity of choosing a sampled lineage pair is k choose 2 *I(t) choose two
4-we then approximate the coalesce rate
5- we can use the coalesce rate to compute an approximation to probablity of tree given the B,S,T.
What does the quality of approximation depend on?
It depends on how well the birth death population dynamics are approximated by deterministic ODE solution. This approximation can perform very poorly when population size is small, as it is at the start of an epidemic.
What are the parameters of birth death vs coalescent?
BD : transmission rate, removal rates, sampling rates/proportions
C : effective population size (nOT THE ACTUAL POPULATION SIZE)
BD models ——, while Coaslecent assumes —–.
sampling process (sampling times/locations are data), number of sampled lineages are small (k«N)
What are the advantages and disadvantages of BD models?
+: 1- accounts for stochastic variability in population dynamics 2- generally easier interpretation of parameters 3- uses information about sampling
-: 1- sensitive to un-modeled changes in sampling fractions
2- difficult to extend to complex population models
What are the advantages and disadvantages of C models?
+: 1- Generally fast likelihood calculations
2- easy to extend to complex population dynamics 3- naturally account for incomplete sampling
- : 1- sensitive to uncertainty in population dynamics at high sampling 2- sensitive to hidden population structure and nonrandom sampling
Both BD and C models —– relate a population’s —- to its ——-, their difference is in their —-.
probabilistically, demography, phylogenetic history, parameterisation
Under the WF modle, how many generation do we have to go back before we find the common ancestor of a pair of genes sampled from a haploid population of size N?
1/N
Suppose you had a tree inferred using present day samples from a population that experience severe bottleneck in its recent past. How and why would this bottleneck likely affect our ability to infer ancestral population dynamics ?
Probability of coalescnce becomes very large at the bottle neck( lineages will be rapidly coalescing) and therefore languages coalesce and becomes very difficult to see past the bottleneck. so beyond this point it’s like we dont have any coalesce anymore so we cant infer anything from a tree which is corresponding to just one tip.
Imagine spreading a WF population across islands in an archipelago, so that movement between the islands is restricted but within each island the population is well mixed. Qualitatively, how would you expect this population structure to influence estimates of the effective population size?
if there is no movement between them whatsoever they all have 3 independent trees.
if there is still some slow movement: within each island the tree and the timing between branching events is based on the population size,
then within the islands, the coalesce rate is dependant on slow movement rate in the process so the length of the branches are going to be longer. see slide 3 of lecture 11 for the tree .