Population Structure Flashcards
What is the definition of a genetically structured population?
A population that is subdivided in some way so that individuals are more likely to breed with a near neighbour rather than a more distant individual - i.e. deviate from random mating with respect to location
- Can have structure in discrete units - e.g., subpopulations (demes)
- Or can be continuous - e.g., allele frequencies change due to geographical distance
Why is population structure important?
- It allows allele frequencies to vary in diff places - for populations to evolve apart - primarily due to genetic drift
- Is important for local adaption
What effect does gene flow have?
- Gene flow homogenises allele frequencies (limits local adaption) - counteracts population structure effects
What is genetic differentiation?
Gentic differentiation is the accumulation of allele frequencies that differ among sub-populations
What are some of the different models that describe population structure?
- Island - number of discrete pops - equal probability of migration between any population
- Stepping stone / (2D) - sequential stepwise migration between population - move one-by-one
- Continuous - migration throgh continuous range limited by dispersal distance - higher probability of breeding with nearer neighbours
- Source-sink - through the core of a species range to the edge
How can we think of population structure in a hierarchical way?
- Individuals
- Sub-populations
- Regions - groups of sub-pops
- Total population
How can we measure/quantify population structure / genetic differentiation?
F statistics:
- Can be seen as the probability of Identity by Descent (IBD) between 2 sequences or 2 alleles
- With random mating - doesn’t matter where you pick sequences/alleles from, but does matter for this (non-random mating)
What are the 3 types of F statistic?
- Fis = probability of IBD of 2 gene copies in a single individual (‘inbreeding coefficient’)
- Fst = probability of IBD of 2 genes in the same deme
- Fit = probability of IBD of 2 randomly picked genes in whole pop
What does Identity by Descent (IBD) mean?
Means that they have been derived from the same allele copy in the previous generation
What is inbreeding and when does it arise?
- Inbreeding occurs when two individuals that are related mate
- Can be relatives, or because they share ancestry in past - e.g., result of bottleneck or founder effect
What is the result of inbreeding?
- Means that the progeny of inbreeding matings is more likely to be homozygous than if individuals were randomly mating - or from a pop with large Ne
- Recessive deleterious mutations are therefore much more probable to occur in homozygous form
- Therefore more likely to show physical and health defects and reduced fertility/fecundity
- Causes less viable/lower fitness than individuals that are not the result of inbreeding
What is inbreeding depression?
Inbreeding depression is the name given to the reduction in fitness relative to outbred individuals
Where can inbreeding often be observed in?
- Domestic animals - result of selective breeding
- Royalty and Nobility or human ethic groups with consangineous marriages
- Populations of conservation concern
How can you measure inbreeding?
Fis statistic:
- If inbreeding occurs, gametes dont meet at random so the 2 gene copies in an individual have a higher probability of IBD (higher Fis)
- Fis = 0 - Hardy-Weinberg equilibrium
- Fis > 0 : deficity of heterozygots relative to HW expectations - may indicate inbreeding or departure from HW e.g.m pop structure
- Fis < 0 : excess of heterozygotes relative to HW expectations
How is pop structure related to inbreeding?
- Pop structure generates deviation from HW equilibrium and can be considered as type of inbreeding - reducing heterozygosity compared to expected under random mating
- Individuals in sub-pop ate more likely to mate within sub-pop than member of neighbouring sub-pop
What is the Wahlund Effect?
The reduction in heterozygosity relative to expectations for a randomly mating population
- Often seen in real world - often caused by physical barriers such as rivers and mountains
- Tells us a lot about population structure
How can you measure subdivision?
Fst:
- Fst is probability fo IBD of 2 gene copies taken at random from the same deme
- Or ‘how much less heterozygous is the deme than expected from whole population
- Similar to Fis - estimate Fst by the increase in chance of finding 2 compite of the same allele in the deme (vs wheat you would expect from random mating)
- Imagined as a ‘measure of proportion of genetic variation between pops’
What do the values of Fst suggest about pop structure?
- Low (Fst = 0 - 0.05) = little genetic differentiation = high gene flow (i.e. populations are very similar)
- Fst > 0.25 = very high genetic differentiation = low gene flow (i.e. populations are very different/isolated)
What is a permutation test and what is it used for?
- Is used to assess if the observed value of Fst is statistically significantly different from zero
- Statistical test after Fst values - generate many Fst values and see what proportion of Fst values is significantly different from zero
What components of a population determines the Fst values?
The equilibrium between drift and migration:
- Drift = demes become more genetically different from each other
- Gene flow (migration) = makes demes genetically similar - homogenizes allele frequencies
- So - equilibrium depends on deme size (N) and migration rate (proportion of migrants)
How can you model the drift-mutation balance?
- Wrights idealised ‘Island Model’: Fst = 1 / 1 + 4Nm (N = deme size and m = number of individual migrants) - no geographic substructure
- Stepping stone model: allows geograpic structure, exchange occurs between adjacent demes, mating choice limited by distance, isolation by distance model
What is Principle component analysis (PCA)?
- Multivariate method of extracting and summarising relevant information from complex datasets
- Transforms a large number of possibly correlated variables - e.g., SNP frequencies, haplotype frequencies and pairwise differences in pops
- Used to explain the variance structure of a set of variables through some linear combination of these variable - to extract the principle components - e.g., can plot components against each other to visualise data
What is an assignment test?
- Aim to match an individuals genetic profile to where it has come from
- Important in conservation, forensics etc
- One way: frequency method - assign a probability of an individual coming from an individual population at each locus based on frequencies
- Lends itslef to likelihood tests
Give an example of an assignment test
Fishing fraud in a Finnish fishing competition:
- One overly impressive salmon
- Compared fish against a reference panel of fish to assess the likelihood of this fish coming from this lake/river
- 7 microsatellite loci
- 126 reference salmon from neighbouring lakes and rivers
- Suspect fish had extremely low probability of being from the competition lake
What is a limitation of an assignment test?
- Need lots of prior information about population structure - reference to compare to
- Need assumptions of populations desugnation or allele frequencies
What is Bayesian cluster analysis?
Based around Wahlund effect:
- i.e. mixing of genetically distinct population generates HW disequilibrium
- Assume that disequilibrium arises because of pop structure
- Algorithm that finds clusters of genotypes that minimise HW and gametic disequilibrium across datset
- Finds which cluster has the highest probability of being true
Why is Bayesian analysis good?
You dont need any prior assumptions of population designation or allele frequencies / don’t need lots of information about population
What can Bayesian analysis be used for?
To infer the number of populations contributing to a pool of individuals or unknown
- Assign individuals to populations
- Estimate proportion of an individuals ancestry that originates from each pop (admixture pops), can be used to identify the offspring of immigrants or hybrids
Give an example of the use of Bayesian cluster analysis?
Kenyan Taita thrushes
- Identifies where genotypes match to
- Identifies which individuals may be migrants / offspring of migrants
What are the limitations of Bayesian Cluster analysis?
- Assumption of random mating limits use with selfing species
- Processes other than subdivision can generate HW and gametic disequilibrium
- For pop assignment required true population of origin to be sampled