Measures of Diversity Flashcards

1
Q

What different ways can you measure diversity - statistics?

A
  • Allelic diversity
  • Observed and expected heterozygosity
  • Measures of identity - Nei’s gene diversity
  • Measures of nucleotide diversity - population mutation parameter (theta), no. segregating sites, nucleotide diversity
  • Other - mismatch distribution, site frequency spectrum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you measure allelic diversity?

A

Mean number of alleles per locus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you measure observed and expected heterozygosity?

A

Observed heterozygosity: Observed proportion of heterozygous genotypes at a locus - can be at one or across many - can be thought of a proportion

Expected Heterozygosity: expected heterozygosity at a locus expected under HW equilibrium: He = 1 - SUMpi^2 - pi = frequency of allele ‘i’
- If in HW equilibrium - observed and expected heterozygosity should be the same/very close
- If very different - we can infer things about population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain Nei’s gene diversity

A

Nei’s diversity, H, is the probability that two alleles drawn at random from the population will be different from each other
- Can be used for diploids and haploids: Diploids - heterozygosity - scaled by sample size
- Haploids - virtual heterozygosity
- H = n(1 - SUMxi^2)/n-1
- n = sample size (number of alleles)
- xi = frequency of the ith allele
- Uses haplotypes - good for mitochondrial data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain what population mutation parameter (theta) is

A
  • Theta = population mutation parameter = mutation-drift parameter = neutral parameter
  • Theta is the expected value of diversity under the neutral model
  • Theta = 4Nu - in diploids
  • Can be estimated via different ways and allows hypothesis testing
  • Can be basis for selection tests - e.g., Tajima’s D
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a feature of the assumptions of different theta estimators?

A
  • They all have different assumptions
  • But should give same result under neutrality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What different ways can the population mutation parameter (theta) be estimated ?

A
  • Number of segregating sites (S)
  • Site frequency spectrum
  • Number of singletons (n)
  • Mismatch distribution
  • Mean number of pairwise differences (PI)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain ‘Number of segregating sites’

A

Number of segregation sites is the number of nucleotide positions that vary within a set of DNA sequences - Sn (when n is sample size)
- Weak measure because depends on length of sequence
- Can be converted into proportion of segregating sites - which is less dependent on the length of sequence but still depends on sample size
- Can use with the coalescence - can use Tajima’s test to compare value with one obtained from heterozygotes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain Nucleotide Diversity

A

The average proportion of nucleotide differences between all possible pairs of sequences in the sample
- Analagous to Nei’s Gene diversity measure - but is applied to polymorphisms
- Sample size independent
- If nucleotides evolve under neutrality, PI should be the same as theta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the Mismatch Distribution

A

Is the distribution of pairwise differences (histogram)
- Requires discrete differences in the data
- Shape of mismatch distribution is very helpful - indicative of history of population - e.g., if a particular size of differences dominates - happens when lots of sequences coalesce at same time point
- Is a plot histogram as proportion of sequence pairs with that number of nucleotide differences between them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the different shapes of the mismatch distributions and what do they infer about the populations history?

A
  • Constant sized population: has ‘spikey’ mismatch distribution - high raggedness statsistic (r) + long internal branches on trees - spikes mean many sequences coalesce at the same time point
  • Expanding population: characterised by modal mismatch distribution and low raggedness score + short internal branches on tree - because variation occurs after expansion so variable sites are distributed on the terminal ‘tip’ branches - so most sequences will have a similar number of differences between them
  • Curve moves further right as population expands
  • e.g., recent expansion = not been a lot of time for new differences to accumulate - so nucleotides are only differing by one or two nucleotides
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do nucleotide diversity and coalescent theory link mathematically?

A
  • T x u - probability that a mutation happens at a specific branch
  • (T1 + T2 + T3 … ) x u - total number of mutations in the tree
  • u = 4Nu - average length between two twigs = PI
  • Under neutral model 4Nu = PI = theta
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain Site Frequency Spectrum

A

Comapre the number of times a segregating site is present within a set of sequences
- Looking at the relative occurrence of variance with a particular frequency
- Some appear just once (singletons), others are in multiple sequences - Single site that occurs only once = ‘singleton’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do site frequency spectrum differ depending on the population?

A
  • Recent growth/ population expansion or positive selection - variants arose recently so most sites are present at low frequencies - excess of rare variants - e.g., positive selection (selective sweep) can mimic what may happen with population expansion (potentially after a founder effect for example) - need to be able to differentiate
  • Balancing selection - causes an excess of more frequent variants - or genetic subdivision - due to lack of low frequency loci
  • Example of how demographic and selection processes can lead to the same effects - need to think about how we can differentiate between the difference in cause for these effects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you differentiate between the effects of demographic and selection processes?

A
  • Demographic processes - e.g., pop size - will effect all individuals in population
  • Whereas selection - will only effect a smaller number of loci - so can test for this
  • So can calculate number of singletons - that can be present only in the terminal branches
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What exceptions are there to theta = 4Nu? And what else do you need to consider?

A
  • Diploids - (2N) - theta = 4Nu
  • Haploids: Y chromosome and mitochondria - N/2 gene copies - theta = Nu, X-chromosome - theta = 3Nu
    Also need to consider:
  • Unusual effective population sizes - due to variability in reproductive success in males / females
  • Avg male and female generation time differs
16
Q

By calculating theta in different ways and obtaining different results, what can you infer?

A

Use differences to infer what processes might have acted on the population in the past