Association analysis Flashcards
Genetic association
The presence of an allele at a higher frequency in unrelated subjects with a particular trait, compared to those that do not have the trait
Gene variant is associated with disease
Case-control study
- Cases are subjects with the disease of interest, e.g. obesity, schizophrenia, hypertension
- Definition of the disease must be applied in a rigorous and consistent way
- Controls must be as well-matched as possible for non-disease traits
- Such as age, sex, ethnicity, location, etc.
What is needed for a case-control study?
- Large numbers of well-defined cases (10 000s)
- Equal numbers of matched controls
- Reliable genotyping technology (SNP microarray)
- Standard statistical analysis (PLINK)
- Positive associations should be replicated
Genetic markers
- Individuals in a population are genetically far more diverse than individuals in a single family.
- To capture this genetic diversity we need reliable genetic markers
- Genetic markers are alleles that we can genotype and assess whether they are associated with disease
- Association means <100kb from a causal variant
Ideal genetic marker
- Polymorphic
- Randomly distributed across the genome
- Fixed location in genome
- Frequent in genome
- Frequent in population
- Stable with time
- Easy to assay (genotype)
Single Nucleotide Polymorphism (SNP)
- Common in the genome ~1/300 nucleotides
- ~12 million common SNPs identified in human genome
- Generated by mismatch repair during mitosis
SNP formation
- Gene (coding region)
- No amino acid change (synonymous)
- Amino acid change (non-synonymous)
- New stop codon (nonsense)
- Gene (non-coding region)
- Promoter – mRNA and protein level changed
- Terminator - mRNA and protein level changed
- Splice site – Altered mRNA, altered protein
- Intergenic region (98% of genome)
SNP MAF
- SNPs are chosen for genetic association studies on the basis of their MAF
- Common diseases are likely to be caused by common variants
- SNPs with MAF >0.05 (5%) are usually used in association studies - GWAS
- Exceptions are known monogenic disease SNPs
GWAS
Genome Wide Association Study
GWAS Process
• Recruit large numbers of cases and controls
• Genotype markers across the whole genome
Look for association between disease and alleles of each marker – chi-squared test
GWAS results
- GWAS results are presented as a single graph called a Manhattan plot
- All results are plotted, typically for >1M SNPs
- X-axis is the position of the SNP on the chromosome
- Y-axis is –log10(p-value) of the association
WTCC
Manhattan Plots of association of SNP markers with seven diseases
Green peaks indicate significant p-values
• The peak of association often does not identify the gene causing the disease.
• The peak identifies the genomic region associated with disease and this is usually smaller than 100kb.
Meta-analysis
Difficult to do large studies (>1K cases/controls)
Easier to combine smaller studies
• Pre-experiment – Consortium
• Post-experiment – Meta-analysis
Meta-analysis allows the statistical combination of results from multiple studies
Problems with GWAS
- GWAS has identified associations that are statistically strong and reproducible
- However, their contribution to the genetic component of disease is estimated to be low (<5%)
- Possible answers:
- Many common SNPs of very small effect
- Rare SNPs
- Copy Number Variation
- Epigenetic variation
Obesity is strongly
Twin studies • 70-80% of body shape is genetically determined Adoption studies • 30-40% Family studies • 40-60%