Association Analysis Flashcards
What is genetic association?
Genetic Association is the presence of a variant allele at a higher frequency in unrelated subjects with a particular disease (cases), compared to those that do not have the disease (controls)
What broader term is used for ‘disease’?
For disease we could use the broader term “trait”, for example height is not a disease
What is an allele?
One form of a variant in the genome
What is a locus?
A position in the genome
What is a genotype?
Both alleles at a locus
What is the haplotype?
This is the order of alleles along a chromosome
What are cases in case control studies?
Cases are subjects with the disease of interest, e.g. obesity, schizophrenia, hypertension
In order to have a successful case control study, what requirement must be fulfilled?
Definition of the disease must be applied in a rigorous and consistent way
Controls must be as well-matched as possible for non-disease traits
Such as age, sex, ethnicity, location, etc
What forms a more reliable case control study?
Measure as many / all relevant factors as possible when taking people into a study
How is a case control study carried out?
Case control study carried out by:
Having cases and controls
Compare them
Identify gene variants in cases and controls
How does the variant frequency differ between cases and controls?
variant occurs at a higher frequency in cases than control ∴
Gene variant is associated with disease
What is the purpose of a case control study?
Allows the identification of a genomic region associated with disease either by a single or group of markers
What are the trademarks of a good case control study?
- Large numbers of well-defined cases (1000s)
- Equal numbers of matched controls
- Reliable genotyping technology (SNP array)
- Standard statistical analysis (PLINK)
- Positive associations should be replicated
How diverse is a population?
Individuals in a population are genetically far more diverse than individuals in a single family
How can we identify how diverse a population is?
To capture this genetic diversity we need to use 100,000s or millions of genetic markers
Outline features of an ideal genetic marker
- Polymorphic
- Randomly distributed across the genome
- Fixed location in genome
- Frequent in genome
- Frequent in population
- Stable with time
- Easy to assay (genotype)
What is an SNP?
single nucleotide polymorphism
How commonly do SNPs appear in the genome?
Common in the genome ~1/300 nucleotides
~12 million common SNPs identified in human genome
How do SNPs arise?
Generated by mismatch repair during mitosis
Where are SNPs found withing the genome?
Gene (coding region)
- No amino acid change (synonymous)
- Amino acid change (non-synonymous)
- New stop codon (nonsense)
Gene (non-coding region)
- Promoter – mRNA and protein level changed
- Terminator - mRNA and protein level changed
- Splice site – Altered mRNA, altered protein
Intergenic region
How are SNPs presented within databases?
SNP found with flanking sequences either side
How do we calculate the frequency of SNPs?
The frequency of SNPs characterised by the minor allele frequency
What are the two forms of SNP frequency
Frequency in general population e.g.
C 0.567
G 0.433
→ the less common allele is called the minor allele
Minor allele is often what we refer SNPs as
What must the allele frequencies add up to?
Major Allele Frequency + Minor Allele Frequency = 1
How are SNPs identified using GWAS?
Use markers across the whole genome
SNP Microarrays
How are GWAS results recorded?
Look for association between disease and each marker – chi-squared test
This has resulted in the detection of large numbers of disease-associated genes
Describe how to analyse GWAS results
P value used to describe significance of SNP
larger P value = more significant
P value of 1 = 0 signifcance
How are GWAS results plotted?
GWAS data is presented as a single graph called a Manhattan plot
X-axis : position of the SNP on the chromosome
Y-axis : –log10(p-value) of the association
if p=10-9 then –log10(p-value)=9
What is a manhattan plot?
The Manhattan plot is a simple way to visualise markers across the genome associated with the disease
Describe the y axis of a manhattan plot
The y-axis of the plot is the –log(base10) of the p-value
If a marker is associated with disease with a p-value of 1x10-9 then the value on the y-axis for this would be 9
Describe the x axis of a manhattan plot
The x-axis is the location on the chromosome
What do the peaks in manhattan plots show?
The peak does not identify the gene causing the disease.
The peak identifies the genomic region associated with disease
After carrying out GWAS, why is further investigation still required to identify the disease gene?
Many associated SNPs are found in the adjacent genes
Still need to identify which gene is actually causing the variant
What is the purpose of meta-analysis?
Meta-analysis allows the statistical combination of results from multiple studies
Why is meta analysis not carried out alone?
Difficult to do very large studies (>10K cases)
Easier to combine smaller studies
- Pre-experiment – Consortium
- Post-experiment – Meta-analysis
Why is a lot of GWAS orientated towards obesity?
Obesity has significant primary effects but majority of problems caused by secondary complications of obesity
Describe the relationship between obesity and cancer
Obesity is a strong predisposing factor to cancer - predicted in the next 3-5yrs obesity will overtake smoking as the primary cause of cancer in UK
What is the significance of carrying out GWAS for obesity?
Identifying and understanding the underlying cause of obesity is vital for prevention and treatment
Outline the evidence suggesting obesity is strongly genetic
Twin studies - 70-80% of body shape is genetically determined
Adoption studies - 30-40%
Family studies - 40-60%
What are the genes associated with obesity?
rs8050136 is in the FTO gene
rs12970134 is near to the MC4R gene
What is the advantage of using GWAS?
GWAS has identified associations that are statistically strong and reproducible
What is the downfall of relying solely on GWAS?
The identified associations found via GWAS to the genetic component of disease is estimated to be low (<5%)
What factors contribute to the low genetic contribution of identified associations found by GWAS?
- Many common SNPs of small effect
- Rare SNPs
- Copy Number Variation
- Epigenetic variation
- Heritability is overestimated