Association Analysis Flashcards
What is genetic association?
The presence of a variant allele at a higher frequency in unrelated subjects with a particular disease (cases), compared to those that do not have the disease (controls).
What is an allele?
One form of a variant in the genome
What is a locus?
A position in the genome
What is a genotype?
Both alleles at a locus e.g. locus 1: 1,4 and Locus 2: 1,1
What is a haplotype?
This is the order of alleles along a chromosome
Why are case-control studies used?
- Cases are subjects with the disease of interest e.g. obesity, schizophrenia, hypertension.
- Defintion of the disease must be applied in a rigorous and consistent way
- Controsl must be as well-matched as possible for non-disease traits such as age, sex, ethnicity, location etc
What is case-control association?
Cases: gene variant is associated with disease
versus controls
Describe how the case control study works
There are two groups:
- Affected cases
- Unaffected controls
Then measure the genetic loci of interest
Statistical analysis to determine which genetic loci correlate with disease
Identify genomic region associated with disease
What is needed in a case-control genetic study?
- Large number of well-defined cases
- Equal numbers of matched controls
- Reliable genotyping technology (SNP array)
- Standard statistical analysis (PLINK)
- Positive associations should be replaced
What is the ideal genetic marker?
- Polymorphic
- Randomly distributed across the genome
- Fixed location in genome
- Frequent in genome
- Frequent in population
- Stable with time
- Easy to assay (genotype)
What is a SNP?
- Generated by mismatch repair during mitosis
- Common in the genome which is about 1/300 nucleotides
- About 12 million common SNPs identified in human genome
How do SNPs arise?
- DNA strands are split and they undergo mitosis.
- One DNA strand replicates
- The other DNA strand replicates but there is a mismatch.
- Usually it would be repaired by the mismatch repair system.
- Rather than the mismatch repair system replacing the mismatch, it replaces the other base on the original strand.
- This because the SNP; T/C SNP.
Where are SNPs located?
In the Gene coding region:
- No amino acid change (synonymous)
- Amino acid change (non-synonymous)
- New stop codon (nonsense)
In the Gene non-coding region:
- Promoter - mRNA and protein level changed
- Terminator - mRNA and protein level changed
- Splice site - altered mRNA, altered protein
In the intergenic region
What is the dbSNP?
It is an online database at NCBI of single nucleotide polymorphisms (SNPs) and multiple small-scale variations that include insertions/deletions, microsatellites, and non-polymorphic variants.
What is the minor allele?
It is the less common alllele. Each allele has a frequency in the general population and the minor allele has a MAF.
What does the Minor AF + Major AF add up to?
1
What is a genome wide association study (GWAS)?
Use markers across the whole genome
What do SNP microarrays do?
- Look for association between disease and each marker - chi-squared test
- This has resulted in the detection of large numbers of disease-associated genes
How is GWAS data presented?
It is presented as a single graph called a Manhattan plot.
What is the X-axis and Y-axis in a Manhattan plot?
- X-axis is position of the SNP on the chromosome
- Y-axis is -log10 (p-value) on the chromosome
What is a Manhattan plot?
A simple way to visualise the markers across the genome associated with the disease.
What is the WTCCC?
It is the Wellcome Trust Case Control Consortium
- Contains 1958 Birth Cohort and the UK blood service as the controls.
- Looks at cases of CAD, Type 1 and 2 diabetes, hypertension, rheumatoid arthritis, Crohn’s disease and bipolar disorder
What do the peaks indicate in manhatten plots?
Significant p-values of p <5x10-5
What are some misconceptions of the peaks in GWAS results?
- The peak does not identify the gene causing the disease
- The peak identifies the genomic region associated with the disease
What is another graph that can be used to show GWAS results?
Regional Association plot
Advantages and Disadvantages of meta-analysis
- Difficult to do very large studies (>10K cases)
- Easier to combine smaller studies
- Pre-experiment - consortium
- Post-experiment - meta-analysis
- Meta-analysis allows the statistical combination of results from multiple studies
What are the medical complications of obesity?
- Pulmonary disease
- Idiopathic intracranial hypertension
- Stroke
- Cataracts
- Coronary heart disease
- Severe pancreatitis
- Diabetes
- Cancer
- Phlebitis
- Gout
- Osteoarthritits
- Gynecologic abnormaltiies
- Gall bladder disease
- Nonalcoholic fatty liver disease
What studies are used to investigate the genetic components of common obestiy?
- Twin studies
- Adoption studies
- Family Studies
What did an obestiy GWAS show?
There were genes associated with waist size, extremes, fat mass and BMI and they all overlapped.
What is the problem with GWAS?
- It has identified associations that are statistically strong and reproducible but their contribution to the genetic component of disease is estimated to be low (less than 5%)
- For example disease may infact be caused by other things usch as:
- Common SNPs of small effect
- Rare SNPs
- Copy Number Variation
- Epigenetic variation
- Heritability is overestimated