Association Analysis Flashcards
Define the term “Genetic Association”
The presence of an allele at a higher frequency in unrelated subjects with a particular trait, compared to those that do not have the trait
How do we determine whether variants in the genome are associated with a disease?
If we substitute the word “disease” for “trait”.
- With disease = cases
- Without disease = Controls
Describe how a genetic association study is conducted
- Cases are subjected with the disease of interest E.g obesity
- Definition of the disease must be applied consistently
- Controls must be as well (similarly) matched as possible for the non disease traits
- such as age, sex, ethnicity, location
What other factors need to be catered for?
Match for all other risk factors
- affected/unaffected cases
- Measure genetic loci of interest
- Statistical analysis which genetic loci are associated with disease
- Identify genes/genomics region
How would you make the study fair?
- Use a large number of well defined cases
- use an Equal number of matched controls
- Reliable genotyping technology (SNP microarray)
- Standard statistical amalysis (PLINK)
- Positive associations should be replicated
How does using a genetic marker fit well in a genetic association study?
Individuals in a population are genetically far more diverse than individuals in a single family
How is this genetic diversity captured?
- This genetic diversity is captured through reliable genetic markers
- Genetic markers are alleles that we can genotype and assess whether they are associated with the disease
- Association means <100kb from a casual variant
What is the ideal Genetic Marker?
If it is
- Polymorphic
- Randomly distributed across the genome
- Fixed location in genome
- Frequent in genome
- Frequent in population
- Stable with time
- Easy to assay
How is a Single Nucleotide Polymorphism (SNP) used in a genetic association study?
- Common in the genome ~1/300 nucleotides
- 12 million common SNPs identified in human genome
- Generated by mismatch repair during mitosis
Where could SNP’s be found?
In the gene (Coding region)
- No amino acid change (synonymous)
- Amino acid change (non-synonymous)
- New stop codon (nonsense)
Where else could SNP’s be found?
In the gene (Non coding region)
- Promoter: mRNA and protein level changed
- Terminator: mRNA and protein level changed
- Splice site: Altered mRNA, altered protein
Could also be found in the intergenic region (98% of genome)
Describe what a dbSNP is
- An online database at NCBI, database of SNP’s
- The rs number is a unique identifier given to each SNP
- Has two unique flanking sequences between a single polymorphism
Describe what a minor Allele Frequency is (MAF)
SNP’s have two forms. The major and minor form.
- The less common allele is called the minor allele
- Major allele frequency + Minor allele frequency = 1
Why are SNP’s chosen for genetic association studies?
- SNP’s are chosen on the basis of their MAF
- Common diseases are likely to be caused by common variants
- SNPs with MAF >0.05 (5%) are usually used in association studies - GWAS
- Exceptions are known monogenic disease SNPs
How is a Genome Wide Association Study carried out?
- Recruit large numbers of cases and controls
- Genotype markers across the whole genome
- Look for association between disease and alleles of each marker (Chi squared test)
- Positive association is at p<5x10-8 (multiple testing correction)
How are the GWAS results plotted? VD
- Presented as a single graph called a Manhattan plot
- All results are plotted typically for >1M SNPs
- X-axis Is the position of the SNP on the chromosome
- Y-axis is -log10 (p-value) of the association
View the results from the Wellcome Trust Case Control Consortium study. How are these results interpreted?
- The peak of association often does not identify the gene causing the disease
- The peak identifies the genomic region associated with disease and this is usually smaller than 100kb.
When is a meta analysis used?
- When is it difficult to do large studies (>1K cases/controls)
- Easier to combine smaller studies
- Meta analysis allows the statistical combination of regulars from multiple studies
What are the problems with GWAS?
- GWAS has identified associations that are statistically strong and reproducible
- However their contribution to the genetic component of disease is estimated to be low (<5%)
Possible answers
- Many common SNPs of very small effect
- Rare SNPs
- Copy Number Variation
- Epigenetic variation
What are some of the medical complications of obesity?
- Pulmonary disease
- Coronary Heart Disease
- Severe pancreatitis
- Stroke
- Cancer
How strongly is Obesity linked to genetics?
- Twin studies: 70-80% of body shape is genetically determined
- Adoption Studies: 30-40%
- Family Studies: 40-60%
What was the results of the large scale meta analysis done on obesity?
- BMI meta analysis in 322k subjects
- 97 BMI associated loci
- 125 separate studies
- > 60l authors and >2000 collaborators
What are some advantages of GWAS?
- They have already highlighted hundreds of loci that are associated with obesity or BMI related traits
- Results are reproducible
- however there’s still a long way to understand how all these genes contribute to obesity
What is the relationship between linkage analysis and recombination?
- During meiosis, recombination occurs at random points along the chromosome
- Recombination reduces the linkage between the two loci over time in proportion to the distance between them
What is Linkage Disequilibrium?
- LD is when two alleles are inherited together more often than expected by chance
- This is usually because they are close together in the genome
- Alleles that are physically close together are rarely separated by recombination
How is Linkage disequilibrium used in disease gene mapping?
- If a genetic marker allele and an undiscovered allele for disease susceptibility are in LD then the genetic marker will be associated with disease
- Genotype many marker alleles and we can find the regions of associations with disease - GWAS
- Remember that the marker alleles are close together in a GWAS so we would expect to see many alleles associated with disease in a region due to LD