Association Analysis Flashcards
What is genetic association?
Genetic Association is the presence of an allele at a higher frequency in unrelated subjects with a particular trait, compared to those that do not have the trait
If we substitute the word “disease” for trait” this is how we determine whether variants in the genome are associated with a disease
With disease = cases
Without disease = controls
What are case-control studies?
Cases are subjects with the disease of interest, e.g. obesity, schizophrenia, hypertension
Definition of the disease must be applied in a rigorous and consistent way
Controls must be as well-matched as possible for non-disease traits
- Such as age, sex, ethnicity, location, etc.
How would you carry out a case-control genetic study?
Take your affected cases and unaffected controls and match them for all other risk factors
Measure the genetic loci of interest for both groups, usually done by genotyping
Then carry out a statistical analysis to determine which genetic loci are associated with the disease
We need:
Large numbers of well-defined cases (10 000s)
Equal numbers of matched controls
Reliable genotyping technology (SNP microarray)
Standard statistical analysis (PLINK usually for genome-wide analysis)
Positive associations should be replicated
How does location affect SNPs effect?
Gene (coding region)
- No amino acid change (synonymous) - Amino acid change (non-synonymous) - New stop codon (nonsense)
Gene (non-coding region)
- Promoter – mRNA and protein level changed - Terminator - mRNA and protein level changed - Splice site – Altered mRNA, altered protein
Why are SNPs used as genetic markers for association analysis?
SNPs are chosen for genetic association studies on the basis of their MAF
Common diseases are likely to be caused by common variants
SNPs with MAF >0.05 (5%) are usually used in association studies - GWAS
Exceptions are known monogenic disease SNPs
What’s a GWAS?
We need to recruit large numbers of cases and controls
We need genotype markers across the whole genome
- SNP Microarrays – see separate session
We look for association between disease and alleles of each marker – chi-squared test
Positive association is at p<5x10-8 (multiple testing correction)
How do we plot GWAS results?
GWAS results are presented as a single graph called a Manhattan plot
All results are plotted, typically for >1M SNPs
X-axis is position of the SNP on the chromosome
Y-axis is –log10(p-value) of the association
Manhattan Plot - so we are looking for the ‘skyscrapers’
Does a GWAS result identify the causal gene?
The peak of association often does not identify the gene causing the disease.
The peak identifies the genomic region associated with disease and this is usually smaller than 100kb.
What are the problems with GWAS?
GWAS has identified associations that are statistically strong and reproducible
However, their contribution to the genetic component of disease is estimated to be low (<5%)
Possible answers:
- Many common SNPs of very small effect
- Rare SNPs
- Copy Number Variation
- Epigenetic variation