Genome wide association studies Flashcards
What types of diseases do rare but high impact alleles cause?
Mendelian disorders
What types of diseases do common but low impact alleles cause?
Multifactorial disorders
What types of diseases do common but high impact alleles cause?
Mendelian disorders with a heterozygote advantage. Favourable for heterozygotes but cause disease in homozygotes
What is linkage disequilibrium?
Non-random association of alleles at two or more loci. A particular grouping of alleles at linked loci occurs more often than by chance
How do we determine the max number of haplotypes for a set of loci?
2^x, where x = number of loci being examined
How do we determine if loci are showing linkage disequilibrium?
Determine the frequencies of alleles in the population, then use that to calculate the expected frequencies. Then do a Chi square test to determine if the observed frequencies differ significantly from the expected frequencies
If the Chi square test was significant, would there be evidence of linkage disequilibrium between the loci?
Yes, if the expected frequencies differ from the observed frequencies, there is a set of alleles inherited together more than what is expected by chance
What does D mean in linkage disequilibrium?
The deviation, the magnitude of the disequilibrium
What do we usually use as a measurement of linkage disequilibrium?
r^2
What does an r^2 of 0 mean for linkage disequilibrium?
Complete equilibrium and random association of alleles at the loci
What does an r^2 of 1 mean for linkage disequilibrium?
Complete disequilibrium
What are the two factors that will affect how long a region stays in linkage disequilibrium?
Time and distance. Will break down in alleles further apart over time
What are two international projects that have focused on documenting genomic variation in different populations?
International HapMap project and the 1000 genomes project. Have identified and mapped millions of SNPs
What are haplotype blocks?
Regions in a chromosome with high degree of linkage disequilibrium and little diversity in which alleles are present
What defines a haplotype block?
Fewer haplotypes in the population are present than what would be expected by 2^x
Why are haplotype blocks useful for studying the association between a SNP and a disease locus?
Since they’re always inherited together and we can use one allele in there to predict the others, we can look at a representative SNP for that block and it cuts down our work
Are allele frequencies static across populations?
Not in the slightest. Need to consider the population being studied
What does LD LINK do?
Lets us select SNPs in a population and generates a heat map while also calculating r^2 and D. Cooler colours indicate lower r^2 and warmer colours indicate high r^2 and high LD
Will the same alleles be in linkage disequilibrium between populations?
Nope. Can be in LD in one population and not the other
What are LD maps?
Triangle heat map plots displaying the degree of LD between select SNP loci. Makes haplotype blocks really obvious (strong LD)
What would be the ideal end goal of a GWAS study?
Look at every single SNP in the human genome for association with a disease
Why don’t we need to look at every single SNP while doing GWAS studies?
Haplotype blocks. Some are in such strong LD that we only need to look for association between 1 or 2 of them from that block to know if there’s association with anything else in that block
What are tag/proxy SNPs?
Representative SNPs from a haplotype block
What are the two critical advances that have allowed us to do GWAS studies in the first place?
- Ability to do high throughput SNP genotyping with SNP arrays
- Knowledge that the genome is comprised of haplotype blocks with strong LD
How is genotyping done in GWAS studies?
SNP chip arrays. One for each person and includes >500 000 SNPs on it
How is a GWAS study set up?
- Get a big group of people with the disease, and another big group of people without it
- Genotype every single individual with SNP arrays
- Score the allele frequencies in each group
How do we determine if a SNP allele and a disease state are associated?
Do a Chi square test to see if the SNP allele and the disease state are independent
How do we tell if a SNP and disease state are associated with a Chi square test?
If the test stat > crit value, reject the null and they are associated
Why isn’t it always feasible to have the same sample size for both the control and affected groups in a GWAS study?
- Some people might die before the study is complete
- Some people could revoke their consent
- Sometimes can’t recruit enough people for both groups (like if the disease is rare)
What is the odds ratio?
Compares the risk of a complex disease in two groups with different genotypes
What is the typical odds ratio we see for complex diseases? Why?
1.1 to 1.2, very small increases in risk. Because these alleles alone don’t do much, but they interact with other alleles and the environment
How do you select individuals for your case group?
Select those that will best be able to find the associations, like the most severely affected, familial history, early age of onset
How do you select individuals for your control group?
Select those with similar sex, age, and demographics to the case group, but are at the lowest risk of developing the disease
What’s the most common way to display the results of a GWAS study?
Manhattan plot
How do we interpret a Manhattan plot?
All the SNPs on the array are arranged by chromosome on the x-axis. The y-axis is the negative log of the p-value for every Chi square test. Threshold for significance is defined. Look for any SNPs above that threshold