Topic 13: Assignment Tests and Population Structure Flashcards
What are assignment tests based on?
Likelihood of individuals genotype being found in either population
What is involved in the assignment test principle and use?
Assign each individual genotype to the subpopulation in which it has the highest likelihood of occuring.
When subpopulations have very different allele frequencies, individuals should be assigned to the subpopulation in which they were born (natal population) where their genotype has higher probability
Immigrants should be assigned to their natal population and not the subpopulation they were sampled in after dispersal
What two things can assignment tests do? What can they not do?
They can measure population structure and they can quantify contemporary dispersal
CANNOT quantify effective migration
Which F stat is applicable to assignment tests?
FST
What are strongly differentiated populations?
LARGE FST VALUES therefore large genetic distances and very different allele frequencies, they are both far from the middle line on the graph and have all their shapes in seperate places
What are weakly differentiated populations?
Not much population structure and genetic distance, similar allele frequencies, small FST values, both close to middle line
What are non-differentiated populations?
Pretty much one population, no clear pattern, populations grouped together on center line and exchange over line between them
What does shape and location on graph mean in an assignment plot?
Shape: where they were sampled
Location: population where the individual is assigned due to genes
What is a measure of genetic divergence on an assignment plot?
The number of cross assigned individuals. They can either arise by chance or indicate it is a migrant into the population it was found, from its original population
What are the three major uses of assignment plots?
Population structure (genetic differentiation), wildlife forensics (matching specimens to source populations), measuring contemporary gene flow (identifying recent immigrants or dispersers)
What is the traditional approach to determine the differentiation between subpopulations?
Consider geographic sample locations as subpopulations and then estimate genetic structure, genetic distance, test for differentiation between the designated subpopulations
When do we use Bayesian methods? What is it?
When population structure is cryptic and we do not know how many randomly mating subpopulations there really is.
You use genetic information first to cluster individuals in such a way as to maximize the differences between groups (FsT) and minimize differentiation within groups (FIS) for any given number of groups.
The likelihood of observing the actual genotype data given K groups
The k that gives the highest probability of the data is the best way of designating the subpopulations
What do Bayesian methods also allow for?
They also allow for admixture (interbreeding) between genetic groups and resulting in mixed individual ancestry
What is an admixture or Structure plot?
A graph that visualizes each individual’s genetic assignment
Each individual is a column, and each genetically distinct cluster (corresponding to a genetic background) plotted in a different colour.
The genetic composition of each individual column shows how much of their genetic background can be assigned to each of the K clusters.
If there is complete population structure and every subpopulation is genetically distinct, then all the individuals within each subpopulations will have the same solid color bar
What does underestimating structure lead to?
Management actions at the wrong spatial scale, inflating population size estimates and preventing protection under legislation such as Species at Risk Act,
What can overestimating structure lead to?
Costly conservation actions for “rare taxa” such as translocations or habitat restoration to unnecessarily promote gene flow.
How can you tell on a graph which K is best?
Where the plateau on the graph is the best K value, if there is a big jump then take the lower K value
What is principal component analysis?
A statistical procedure that uses an orthogonal transformation to convert a set of observations into a set of values linearly uncorrelated variables called principal components
For n observations of p variables the number of principal components will be (n-1,p).
In all cases the first principal component accounts for as much of the variability in the data as possible, and each succeeding component explains the highest amount of variation remaining from the preceding components
What is the goal of Principal Component Analysis?
Goal is to reduce complexity of the dataset to ease visualization, attempt to show as much genetic variability as possible in a few axes. Individuals close to each other on the axes are more genetically similar
What are pros and cons to Bayesian methods?
PROS: can be used to find K and can identify admixed individuals
CONS: assumes HWE within populations, assumes linkage equilibrium between loci, computationally intensive.
What are pros and cons of PCA?
PROS: no assumptions made about underlying genetic model, computationally quick, can handle large data sets
CONS: inherently explorative, generally cannot be used to find k, cannot be used to identify admixed individuals