Analysis association Flashcards
Define Gene association
Genetic Association is the presence of an allele at a higher frequency in unrelated subjects with a particular trait, compared to those that do not have the trait
How do we determine whether variants in the genome are associated with a disease?
If we substitute the word “disease” for trait” this is how we determine whether variants in the genome are associated with a disease
With disease = cases
Without disease = controls
In a case-control study when is there a disease present?
Gene is associated with disease as there are more cases than controls
What are the 4 major rules of case-control studies?
- Cases are subjects with the disease of interest, e.g. obesity, schizophrenia, hypertension
- Definition of the disease must be applied in a rigorous and consistent way
- Controls must be as well-matched as possible for non-disease traits
- Such as age, sex, ethnicity, location, etc.
Using a simple flow map, how do we identify regions that are responsible for cause disease?
On image
How do we carry it out in practise?
- Large numbers of well-defined cases (10 000s)
- Equal numbers of matched controls
- Reliable genotyping technology (SNP microarray)
- Standard statistical analysis (PLINK)
- Positive associations should be replicated
Why do we need reliable genetic markers?
• Individuals in a population are genetically far more diverse than individuals in a single family.
What are genetic markers?
• Genetic markers are alleles that we can genotype and assess whether they are associated with disease
Define assoication
• Association means <100kb from a causal variant
What is the ideal genetic marker?
- Polymorphic
- Randomly distributed across the genome
- Fixed location in genome
- Frequent in genome
- Frequent in population
- Stable with time
- Easy to assay (genotype)
What is an SNP?
- Common in the genome ~1/300 nucleotides
- ~12 million common SNPs identified in human genome
- Generated by mismatch repair during mitosis
How might an SNP arise?
On image
Where are SNPS found?
• Gene (coding region)
No amino acid change (synonymous)
Amino acid change (non-synonymous)
New stop codon (nonsense)
• Gene (non-coding region)
Promoter – mRNA and protein level changed
Terminator - mRNA and protein level changed
Splice site – Altered mRNA, altered protein
• Intergenic region (98% of genome)
What is dbSNP?
The Single Nucleotide Polymorphism Database
Allows us to find information about SNPS
What is the minor allele in dbSNP?
The less common allele, dbSNP allows us to see this in SNPS
Why are SNPS chosen?
- SNPs are chosen for genetic association studies on the basis of their MAF
- Common diseases are likely to be caused by common variants
- SNPs with MAF >0.05 (5%) are usually used in association studies - GWAS
- Exceptions are known monogenic disease
What is GWAS?
Genome Wide Association Study (GWAS)
• Recruit large numbers of cases and controls
• Genotype markers across the whole genome
SNP Microarrays – see separate session
• Look for association between disease and alleles of each marker – chi-squared test
• Positive association is at p<5x10-8 (multiple testing correction)
What does a GWAS give us in terms of results?
P value a value of confidence – measure of validity
Large numbers means more significant
Refer to table
How do we plt the results of a GWAS?
What is the manhattan project?
- GWAS results are presented as a single graph called a Manhattan plot
- All results are plotted, typically for >1M SNPs
- X-axis is position of the SNP on the chromosome
- Y-axis is –log10(p-value) of the association
The Manhattan plot is a simple way to visualise the markers across the genome associated with the disease. The y-axis of the plot is the –log(base10) of the p-value, so if a marker is associated with disease with a p-value of 1x10-9 then the value on the y-axis for this would be 9. The x-axis is the location on the chromosome. Each chromosome is a different colour in the plot above and chromosome locations are given by the number of bases from the start of the chromosome sequence.
What did the Wellcome Trust Case Control Consortium (WTCCC) – the first genetic wide association study in 2007 look at?
What were the results?
Have a look at the regional association plot, what does the red identify?
• Had a look at several diseases
On image
- The peak of association often does not identify the gene causing the disease.
- The peak identifies the genomic region associated with disease and this is usually smaller than 100kb.
This red SNP covers a few genes – but has a high significance – responsible for this peak
What is a meta-analysis?
combine different studies:
• Difficult to do large studies (>1K cases/controls)
• Easier to combine smaller studies
Pre-experiment – Consortium
Post-experiment – Meta-analysis
Meta-analysis allows the statistical combination of results from multiple studies
What are the problems with GWAS?
• GWAS has identified associations that are statistically strong and reproducible
• However, their contribution to the genetic component of disease is estimated to be low (<5%)
• Possible answers:
Many common SNPs of very small effect
Rare SNPs
Copy Number Variation
Epigenetic variation
What are the medical implications of obesity?
On image
Why is obesity strongly genetic?
• Twin studies 70-80% of body shape is genetically determined • Adoption studies 30-40% • Family studies 40-60%
What did large scale meta-analysis reveal for obesity?
- BMI meta-analysis in ~322k subjects
- Locke et al (2015) Nature 518:197–206
- 97 BMI-associated loci (associated with BMI that were statistically significant)
- 125 separate studies
- > 600 authors and >2000 collaborators
- New loci are shown in red