Statistical Genetics - week 7 Flashcards
Hardy Weinberg equilibrium. When to use?
when you have 2 versions of a gene or 2 alleles – may be the normal and the affected or 2 alleles of a SNP
what is hardy weinberg equilibrium?
p^2 + 2pq + q^2 = 1
p + q = 1
p =
the normal allele
q =
the affected allele
p^2 =
represent the proportion of individuals homozygous for the normal allele
2pq =
represents the number of heterozygous or carrier individuals
q^2 =
• q2 represents the proportion of individuals homozygous for the disease allele, and affected with the disease
If the genotype or allele frequencies are significantly different from what you’d expect (not in equilibrium) then it can be an indicator that
there’s a genotyping error with the technology so it’s one of the main reasons you use Hardy Weinberg equilibrium in genetic analysis.
Defn of linkage disequilibrium
The non-random association of alleles at different loci
What results in linkage disequilibrium in populations?
shared ancestral chromosome segments
- Imagine there’s an ancestral chromosome in yellow that picks up a mutation in a disease causing gene
This is a marker of the chromosome and it’s allele 2 of the marker locus. As the ancestral chromosome gets passed through many generations and the disease causing gene becomes prevalent in the population there’s recombination of this chromosome during meiosis. You no longer have a whole yellow chromosome, it gets broken up with the other chromosome in meiosis. So you might see a pattern like the image depending on recombination (if it occurs).
- If that marker locus with the allele 2 on it, is sufficiently close to the disease causing gene
then they will not be separated by recombination very often so will be inherited together. If inherited together more often that you’d expect by chance (i.e. equivalent to if they were on a different chromosome - 50/50 chance they’d be inherited together) then you say that the 2 variants are in linkage disequilibrium
When is linkage disequilibrium useful?
Linkage disequilibrium is really useful for mapping genes
Can think about how variation occurs within the genome i.e. SNPS
If everything segregated at random and if you assume that the allele frequency of each of these alleles is 50% then you’d expect to see the following pattern:
We’d expect for two SNPs with four alleles each at 50% frequency four “chromosomes” each at 25%
i.e. A-C, A-D, B-C, B-D
each at 25%
Can think about how variation occurs within the genome i.e. SNPS
However in reality if 2 loci are in linkage disequilibrium what you see is:
2 options
1st option: A-C 33%, A-D 33% or B-C 33%
2nd option: A-C 50%, A-D 50%
For two loci that are in linkage disequilibrium - option 1
only 3 haplotypes of 4 possible occur unless: recurrent mutation or recombination
For two loci that are in linkage disequilibrium - option 2
only 2 haplotypes of 4 possible occur unless: recurrent mutation or recombination or selection drift
Measuring linkage disequilibrium between alleles at two SNPS
D’ (D prime) and r^2
another definition for linkage disequilibrium
Measures whether alleles occur together more or less frequently than expected by chance
D’ =
measures linkage disequilibrium and is not affected by differences in allele frequencies of the 2 SNPs– value between 0 and 1.
when does D’ = 1?
If you don’t see the recombination events happening between the 2 loci, so you see only 2 or 3 of the haplotypes (4 possible) always then you’ve got complete linkage disequilibrium so D’ = 1. • D’ = 1, complete LD, when 2/3 of the 4 possible haplotypes are present in a population, no recombination between loci.
perfectly correlated?
If you also always see allele G of SNP 1 with allele T of SNP 2 and allele A of SNP 1 with allele C of SNP 2 – always the case they’re also perfectly correlated (only see 2 of the 4 haplotypes then that means that r2 = 1 (this depends on the allele frequency)
r^2 =
correlation, 1 means perfect correlation between the alleles at the SNPs, when there are only 2 of the 4 possibly haplotypes present
when do you have a high D’ and high r^2
if SNPS are of a similar allele frequency and there’s no recombination then you have a high D’ and a high r2
Why is measuring linkage disequilibrium between alleles at two SNPS important?
Important in GWAS as the haplotype blocks means there’s lots of redundancies when you come to genotyping
In populations non - random recombination causes
haplotype blocks
haplotype blocks
within a block SNPs are in linkage disequilibrium and tag each other
what is another option apart for genotyping all SNPS in GWAS?
For GWAS instead of genotyping all the SNPs in the genome we can capture many of them by using Tag SNPS:
GWAS signals capture SNPs in linkage disequilibrium in the same haplotype block
If these 2 SNPS are in linkage disequilibrium together and they have a high r2 then you don’t need to genotype both of them because by genotyping 1 you will know the genotype of the 2nd. Some may capture 2 or 3 or 4 or 5 different SNPS.
In GWAS able to reduce the number of genotyping SNPS that you have to
why do we only need to analyse a small proportion of SNPs?
We only need to analyse small proportion of SNPs because across 10’s – 100’s kb there is extensive linkage disequilibrium
Take home message: Hardy Weinberg
Hardy Weinberg equilibrium is used to calculate allele and genotype frequencies
Take home message: LD
Genetic variants in linkage disequilibrium tend to be inherited together on the same haplotype
Take home message: how is LD measured?
Linkage disequilibrium is measured by D’
Take home message: SNPs in LD
SNPs in linkage disequilibrium and of similar allele frequency are highly correlated (r2)
Take home message: GWAS
In association studies (GWAS), can use tag SNPS that are in linkage disequilibrium with our genotype SNPS to include much of the variation within the whole genome (In association studies significant signals tag causal SNPs that may not have been included in the genotyped SNPs.)