Module 3 Flashcards

1
Q

What is Principal Component Analysis (PCA)?

A
  • Simplify a large data set into a smaller set while still maintaining significant patterns and trends
  • Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables
  • These combinations are done in such a way that the principal components are uncorrelated and most of the information within the initial variables is compressed into the first components.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

PCA vs population genetic modeling

A
  • PCA is a descriptive tool
  • Statistical modelling allows us to tease apart how different processes are shaping the data
  • Ex. recombination variation effects, DFE, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Aspects of population history

A

Population sizes through time, Changes in population size, Population splits, Migration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why demographic inference is important

A
  • A large fraction of mutations are effectively neutral and hence involve under genetic drift
  • The majority of newly arising mutations that affect fitness are deleterious
  • Natural populations have undergone complex demographic histories. The combined effects of population size changes, structure, and migration all shape patterns of within-species variation
  • The efficacy of both mutation and recombination are mediated by the effective population size.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Coalescent theory

A
  • Considers the genealogical history of genes in populations
  • Uses DNA sequences data to make inference about population size, genetic structure, and evolutionary processes
  • Coalescent processes are backward in time
  • Analytical approximation of neutral processes, thus extremely fast for simulation purposes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Coalescent processes

A
  • Coalescent events happens rapidly when there are many lineages
  • Coalescent events happen much more slowly when there are few lineages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Coalescent vs diffusion approximation

A
  • Diffusion approximation tracks allele frequency changes through time
  • Coalescent theory focuses on tracing the genealogical history of sampled gene lineages backward in time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

An example demographic inference pipeline

A

STEP 1: Mask coding regions, and regions linked to coding regions
STEP 2: Identify the number of populations in your dataset, via software such as STRUCTURE of ADMIXTURE
STEP 3: Identify a set of demographic models to use for inference
STEP 4: Run dadi to infer which demographic model fits our observed SFS best
STEP 5: Run dadi to infer the best-fitting parameters for our best-fitting model (this time assessing log-likelihood only
STEP 6: Simulative the best-fitting model with best-fitting parameters using a coalescent-based simulator and compare the fit of the simulated SFS with the observed SFS
STEP 7: Assess how realistic the model and parameters are in the context of the population in question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Assessing the best fit, using the log-likelihood and the Akaike information criterion (AIC)

A

The log-likelihood tells us how likely the model is, given the data
The AIC assesses the relative amount of information lost by a given model…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

FSC2 - fastsimcoal2

A

Demographic inference
Likelihood-based
SFS-based
Assessing likelihood on a variety of models based on the model fitting the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The challenges of ancient DNA (aDNA) analysis

A
  • Deamination of cytosine (i.e., causes ‘artificial’ C to T transition in DNA)
  • Most of the DNA is not from the sample you want (e.g., it is rather from microbes that colonized the bone sample after death)
  • Human contamination (anthropologists, lab techs, etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The primary result here is that Neanderthals are more closely related to…

A

Modern non-African populations (i.e., non-African populations are less diverged from Neanderthal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Recurrent Positive Selection

A
  • Multiple selective sweeps
  • Using divergence data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

dN/ds

A
  • Non-synonymous sites / synonymous sites
  • A value that is >1 for dN/ds is evidence for recurrent positive selection/multiple selective sweeps.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Recurrent positive selection things to worry about and how to check

A
  • Check quality of sequence alignment
  • Relaxed constraint
  • Check alignment
  • Check for premature stop codons
    -Check for duplicates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Incomplete Selective Sweep

A

A sweep that hasn’t yet reached fixation in a population

17
Q

Patterns of variation

A
  • High LD
  • Mutations at intermediate frequency
  • Long haplotypes
18
Q

Incomplete selective sweep things to worry about and how to check

A
  • Bottleneck
  • Population structure
  • Low recombination could result in - - - LD/long haplotypes
  • It could be balancing selection
  • SNP ascertainment
  • Meiotic drive
  • Estimate a population history from neutral data
  • Estimate a recombination map
19
Q

Complete selective sweep things to worry about

A
  • Bottleneck
  • Structure
  • Allele surfing of low recombination
20
Q

Allele surfing

A
  • Whatever alleles on the “edge” of a range are expanding, those SNPs are more likely to be in the next generation and the expanding population as they “expand westward” (or span in one direction). An allele that happened by chance to be prominent is “surfing” through the generations
  • “New colonization” type of model; doesn’t work very well in colonized populations where there could be admixing
  • Recurrent bottlenecks in one spatial direction
21
Q

Balancing selection

A
  • Selection to maintain variation
  • Selects to keep variation instead of eliminate it
  • Heterozygote advantage
  • Temporally-varying selection
  • Frequency-dependent selection
    • The individuals with the rarest allele have a fitness advantage
22
Q

Soft Selective Sweep

A
  • At the end of the sweep, there are at least two haplotypes
  • The sweep has generated variation rather than reduced it
    2+ haplotypes possible
23
Q

Selection On Common Standing Variation model

A
  • A neutral/nearly neutral allele became beneficial
  • Possible in humans
  • Considered by those who think soft selective sweeps are frequent
24
Q

Selection On Recurrent Identical Beneficial Mutations model

A
  • Two separate events mutations happening at the same site
  • Very unlikely in humans
  • Present in organisms with very high mutation rates (like HIV)
25
False Discovery Rate (FDR)
- ~5% for a soft sweep and ~0% for a hard sweep - Under neutrality, there are hardly any soft selective sweeps despite what is reported. The hard sweeps bias against detection was also not accounted for. There were no multiple tests or attempts to correct.