association analysis Flashcards
Genetic Association
presence of a variant allele in the genome at a higher frequency in unrelated subjects with a particular disease of interest/trait (cases) compared to those that do not have the trait/disease (controls)
with disease =
cases
without disease =
controls
Controls vs case in association analysis
must be identical to the cases APART from not having the disease = well matched
e.g. same age, sex, ethnicity, location etc.
Difference in variant frequency between cases and controls
in cases, the gene variant is at a higher frequency than in the controls and is associated with the disease
What confirms the strength of association between a gene variant and the disease?
Statistics e.g. p-value
Quality case-control genetic studies must have:
- large numbers of well defined cases (1000s)
- equal numbers of matched controls
- reliable genotyping technology (SNP array)
- standard statistical analysis (PLINK)
- positive associations should be replicated
What allows us to capture genetic diversity?
The use of genetic markers
Features of an ideal genetic marker (e.g. SNP)
- polymorphic (more than 1 form)
- randomly distributed across the genome
- fixed location in genome
- frequent in the genome
- frequent in the population
- stable with time
- easy to assay (genotype)
How are SNPs generated?
though mismatch repair during DNA replication
Possible location of an SNP
Gene (coding region)
- no amino acid change (synonymous)
- amino acid change (non-synonymous)
- new stop codon (nonsense)
Gene (non-coding region)
- promoter: mRNA and protein changed
- terminator: mRNA and protein changed
- splice site: altered mRNAm altered protein
Intergenic Region
-98% of SNPs in this region
dbSNP
online database of SNPs and multiple small-scale variations that include insertions/deletions, microsatellites and non-polymorphic variants
Minor allele frequency (MAF)
the frequency of the less common variant in a population
SNP will have two alleles:
- major allele
- minor allele
they must add up to 1
Genome Wide Association Studies (GWAS)
is the presence of an allele. at a higher frequency in unrelated subjects with a particular trait, compared to those that do not have the trait
- recruit large numbers of cases and controls
- SNP markers are used across the whole genome, and these are genotyped using SNP microarrays.
- We look for association between disease and each marker by doing a chi-square test
Steps of GWAS
Obtain DNA from people with disease of interest (cases) and unaffected people (controls)
Run each DNA sample on a SNP chip to measure genotypes at 300,000-1,000,000 SNPs in cases and controls
Identify SNPs where one allele is significantly more common in cases than controls
-the SNP is associated with disease
How is GWAS data presented?
on a Manhattan plot
what is a Manhattan plot in relation to the GWAS results
The end result from GWAS that shows the score for each SNP marker tested in the study, across each chromosome in the genome - a peak indicates that the location of that particular SNP may be very close to a disease-related locus - low p-value = high peak –> reject null hypothesis for that SNP - shows evidence of correlation of each marker tested with a disease phenotype across the genotype
*highest point is MOST significant SNP with disease state
X-axis is position of SNP on chromosome
Y-axis is -log10(p-value) of the association between each marker and the disease, calculated using a chi-squared test
Why are chromosomes more solid at the bottom of the Manhattan plot?
because the vast majority of SNPs are not associated with disease
*associations are further up the graph
What does the peak on a Manhattan plot show?
The peak does not identify the gene causing the disease, instead it only identifies the genomic REGION associated with the disease and this is usually very small (<100kb)
-if you zoom in on an SNP with strong disease association (high up in plot), you will find lots of other associated SNPs in the adjacent gene and therefore more work has to be done to work out which SNP is correct and where the actual disease-causing variant is
Disadvantage of GWAS + what is meta analysis
Difficult to do very large association studies (>1K cases), and therefore meta-analysis of GWAS is done to combine statistical results from multiple smaller studies
Pre-experiment: consortium
Post-experiment: meta analysis
Problems with GWAS
Expensive
requires large sample size,
only explains a small part of the disease phenotype, meaning their contribution to the genetic component of the disease is estimated to be low (<5%), could be because:
- many common SNPs of small effect
- rare SNPs
- don’t look at copy number variation or epigenetic variation
Common Obesity GWAS
Obesity is strongly genetic
Though GWAS we can clearly see genes associated with obesity