- Considers the genealogical history of genes in populations - Uses DNA sequences data to make inference about population size, genetic structure, and evolutionary processes - Coalescent processes are backward in time - Analytical approximation of neutral processes, thus extremely fast for simulation purposes.

- Coalescent events happens rapidly when there are many lineages - Coalescent events happen much more slowly when there are few lineages

- Non-synonymous sites / synonymous sites - A value that is >1 for dN/ds is evidence for recurrent positive selection/multiple selective sweeps.

Module 3 Flashcards by Erin Merritt

What is Principal Component Analysis (PCA)?

Simplify a large data set into a smaller set while still maintaining significant patterns and trends
Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables
These combinations are done in such a way that the principal components are uncorrelated and most of the information within the initial variables is compressed into the first components.

How well did you know this?

Not at all

Perfectly

PCA vs population genetic modeling

PCA is a descriptive tool
Statistical modelling allows us to tease apart how different processes are shaping the data
Ex. recombination variation effects, DFE, etc.

How well did you know this?

Not at all

Perfectly

Aspects of population history

Population sizes through time, Changes in population size, Population splits, Migration

How well did you know this?

Not at all

Perfectly

Why demographic inference is important

A large fraction of mutations are effectively neutral and hence involve under genetic drift
The majority of newly arising mutations that affect fitness are deleterious
Natural populations have undergone complex demographic histories. The combined effects of population size changes, structure, and migration all shape patterns of within-species variation
The efficacy of both mutation and recombination are mediated by the effective population size.

How well did you know this?

Not at all

Perfectly

Coalescent theory

Considers the genealogical history of genes in populations
Uses DNA sequences data to make inference about population size, genetic structure, and evolutionary processes
Coalescent processes are backward in time
Analytical approximation of neutral processes, thus extremely fast for simulation purposes.

How well did you know this?

Not at all

Perfectly

Coalescent processes

Coalescent events happens rapidly when there are many lineages
Coalescent events happen much more slowly when there are few lineages

How well did you know this?

Not at all

Perfectly

Coalescent vs diffusion approximation

Diffusion approximation tracks allele frequency changes through time
Coalescent theory focuses on tracing the genealogical history of sampled gene lineages backward in time

How well did you know this?

Not at all

Perfectly

An example demographic inference pipeline

STEP 1: Mask coding regions, and regions linked to coding regions
STEP 2: Identify the number of populations in your dataset, via software such as STRUCTURE of ADMIXTURE
STEP 3: Identify a set of demographic models to use for inference
STEP 4: Run dadi to infer which demographic model fits our observed SFS best
STEP 5: Run dadi to infer the best-fitting parameters for our best-fitting model (this time assessing log-likelihood only
STEP 6: Simulative the best-fitting model with best-fitting parameters using a coalescent-based simulator and compare the fit of the simulated SFS with the observed SFS
STEP 7: Assess how realistic the model and parameters are in the context of the population in question

How well did you know this?

Not at all

Perfectly

Assessing the best fit, using the log-likelihood and the Akaike information criterion (AIC)

The log-likelihood tells us how likely the model is, given the data
The AIC assesses the relative amount of information lost by a given model…

How well did you know this?

Not at all

Perfectly

FSC2 - fastsimcoal2

Demographic inference
Likelihood-based
SFS-based
Assessing likelihood on a variety of models based on the model fitting the data

How well did you know this?

Not at all

Perfectly

The challenges of ancient DNA (aDNA) analysis

Deamination of cytosine (i.e., causes ‘artificial’ C to T transition in DNA)
Most of the DNA is not from the sample you want (e.g., it is rather from microbes that colonized the bone sample after death)
Human contamination (anthropologists, lab techs, etc.)

How well did you know this?

Not at all

Perfectly

The primary result here is that Neanderthals are more closely related to…

Modern non-African populations (i.e., non-African populations are less diverged from Neanderthal)

How well did you know this?

Not at all

Perfectly

Recurrent Positive Selection

Multiple selective sweeps
Using divergence data

How well did you know this?

Not at all

Perfectly

dN/ds

Non-synonymous sites / synonymous sites
A value that is >1 for dN/ds is evidence for recurrent positive selection/multiple selective sweeps.

How well did you know this?

Not at all

Perfectly

Recurrent positive selection things to worry about and how to check

Check quality of sequence alignment
Relaxed constraint
Check alignment
Check for premature stop codons
-Check for duplicates

How well did you know this?

Not at all

Perfectly

Incomplete Selective Sweep

A sweep that hasn’t yet reached fixation in a population

Patterns of variation

High LD
Mutations at intermediate frequency
Long haplotypes

Incomplete selective sweep things to worry about and how to check

Bottleneck
Population structure
Low recombination could result in - - - LD/long haplotypes
It could be balancing selection
SNP ascertainment
Meiotic drive
Estimate a population history from neutral data
Estimate a recombination map

Complete selective sweep things to worry about

Bottleneck
Structure
Allele surfing of low recombination

Allele surfing

Whatever alleles on the “edge” of a range are expanding, those SNPs are more likely to be in the next generation and the expanding population as they “expand westward” (or span in one direction). An allele that happened by chance to be prominent is “surfing” through the generations
“New colonization” type of model; doesn’t work very well in colonized populations where there could be admixing
Recurrent bottlenecks in one spatial direction

Balancing selection

Selection to maintain variation
Selects to keep variation instead of eliminate it
Heterozygote advantage
Temporally-varying selection
Frequency-dependent selection
- The individuals with the rarest allele have a fitness advantage

Soft Selective Sweep

At the end of the sweep, there are at least two haplotypes
The sweep has generated variation rather than reduced it
2+ haplotypes possible

Selection On Common Standing Variation model

A neutral/nearly neutral allele became beneficial
Possible in humans
Considered by those who think soft selective sweeps are frequent

Selection On Recurrent Identical Beneficial Mutations model

Two separate events mutations happening at the same site
Very unlikely in humans
Present in organisms with very high mutation rates (like HIV)

False Discovery Rate (FDR)

- ~5% for a soft sweep and ~0% for a hard sweep - Under neutrality, there are hardly any soft selective sweeps despite what is reported. The hard sweeps bias against detection was also not accounted for. There were no multiple tests or attempts to correct.