Association analysis Flashcards
What is genetic association?
presence of a variant allele at a higher frequency in unrelated subjects with a particular disease of interest (cases) compared to those that do not have the disease (controls)
What are cases?
subjects with the disease of interest
What are the controls in association analysis?
must be identical to out cases APART from not having the disease
e.g. same age, sex, ethnicity, location etc.
What is the Difference in variant frequency between cases and controls
in cases, the gene variant is at a higher frequency than in the controls and is associated with the disease
What confirms the strength of association between a gene variant and the disease?
Statistics e.g. p-value
What do quality case control genetic studies require?
- large numbers of well defined cases (1000s)
- equal numbers of matched controls
- reliable genotyping technology (SNP array)
- standard statistical analysis
- positive associations should be replicated
What allows us to capture genetic diversity?
The use of genetic markers
Give some Features of an ideal genetic marker (e.g. SNP).
- polymorphic
- randomly distributed across the genome
- fixed location in genome
- frequent in the genome
- frequent in the population
- stable with time
- easy to assay (genotype)
How are SNPs generated?
though mismatch repair during DNA replication
What are the possible locations for SNPs?
Gene (coding region)
- no amino acid change (synonymous)
- amino acid change (non-synonymous)
- new stop codon (nonsense)
Gene (non-coding region)
- promoter: mRNA and protein changed
- terminator: mRNA and protein changed
- splice site: altered mRNAm altered protein
Intergenic Region
-98% of SNPs in this region
What is dbSNP?
online database of SNPs and multiple small-scale variations that include insertions/deletions, microsatellites and non-polymorphic variants
What is the Minor allele frequency (MAF)?
the frequency of the less common variant in a population
SNP will have two alleles:
- major allele
- minor allele
What is the sum of the major and minor allele frequencies?
Major Allele Frequency + Minor Allele Frequency= 1
What are the Genome Wide Association Studies (GWAS)?
Studies of variations in the entire human genome to identify associations between variations in genes and particular behaviours, traits, or disorders.
SNP markers are used across the whole genome, and these are genotyped using SNP microarrays.
We look for association between disease and each marker by doing a chi-square test
Explain the steps of GWAS.
Obtain DNA from people with disease of interest (cases) and unaffected controls
Run each DNA sample on a SNP chip to measure genotypes at 300,000-1,000,000 SNPs in cases and controls
Identify SNPs where one allele is significantly more common in cases than controls
-the SNP is associated with disease
How is GWAS data presented?
on a Manhattan plot:
The end result from GWAS that shows the score for each SNP marker tested in the study, across each chromosome in the genome - a peak indicates that the location of that particular SNP may be very close to a disease-related locus - low p-value = high peak –> reject null hypothesis for that SNP - shows evidence of correlation of each marker tested with a disease phenotype across the genotype
*highest point is MOST significant SNP with disease state
X-axis is position of SNP on chromosome
Y-axis is -log10(p-value) of the association between each marker and the disease, calculated using a chi-squared test
Why are chromosomes more solid at the bottom of the Manhattan plot?
because the vast majority of SNPs are not associated with disease
*associations are further up the graph
What is the Wellcome Trust Case Control Consortium?
genome-wide association study of 14,000 cases of 7 common diseases and 3,000 shared controls
- looks for specific genes
- look at p-value (the smaller means stronger association)
ex) LDLR, SORT1 for lipids and CAD
What does the peak on a Manhattan plot show?
The peak does not identify the gene causing the disease, instead it only identifies the genomic REGION associated with the disease and this is usually very small (<100kb)
-if you zoom in on an SNP with strong disease association (high up in plot), you will find lots of other associated SNPs in the adjacent gene and therefore more work has to be done to work out which SNP is correct and where the actual disease-causing variant is
Give some Disadvantage of GWAS.
Difficult to do very large association studies (>1K cases), and therefore meta-analysis of GWAS is done to combine statistical results from multiple smaller studies
Pre-experiment: consortium
Post-experiment: meta analysis
Give the Medical Complications of Obesity.
Pulmonary Disease Non-alcoholic fatty liver disease Gall bladder disease Osteoarthritis Skin Gout Phlebitis Cataracts Coronary Heart Disease Cancer Severe Pancreatitis Stroke Idiopathic intracranial hypertension Gynaecological Abnormalities
Give some of the problems of GWAS.
Expensive
requires large sample size,
only explains a small part of the disease phenotype, meaning their contribution to the genetic component of the disease is estimated to be low (<5%), could be because:
- many common SNPs of small effect
- rare SNPs
- don’t look at copy number variation or epigenetic variation
How are genes associated with obesity in GWAS?
Obesity is strongly genetic
Though GWAS we can clearly see genes associated with obesity