W3 - Association Analysis Flashcards
What is Genetic Association?
Genetic Association is the presence of an allele at a higher frequency in unrelated subject to the particular trait, compared to those that do not have the trait.
Substituting the word “disease” for “trait” is how we determine whether variant in the genome are associated with a disease.
Disease = cases
Without disease = control
What is used in conducting a case control genetic study?
Large numbers of well-defined cases (10 000s)
Equal numbers of matched controls
Reliable genotyping technology (SNP microarray)
Standard statistical analysis (PLINK)
Positive associations should be replicated
What features does an ideal genetic marker have?
Polymorphic
Randomly distributed across the genome
Fixed location in genome
Frequent in genome
Frequent in population
Stable with time
Easy to assay (genotype)
What is a Single Nucleotide Polymorphism (SNP)?
Common in the genome ~1/300 nucleotides
~12 million common SNPs identified in human genome
Generated by mismatch repair during mitosis
How are SNPs formed?
CAAGTG is a regular set of bases.
They can be replicated to create another CAAGTG.
However, during replication, a mismatch could occur and create CA-A-GTG. There are mismatch repair systems happening all the time, therefore these can easily be corrected to match the original.
However, these repair systems won’t necessarily know which base is the wrong base from the two mismatched. This means the wrong base can be altered and now the sequence would read CAGGTG.
The A and the G would therefore be SNPs.
Where might you find a SNP?
Gene (coding region)
No amino acid change (synonymous)
Amino acid change (non-synonymous)
New stop codon (nonsense)
Gene (non-coding region)
Promoter – mRNA and protein level changed
Terminator - mRNA and protein level changed
Splice site – Altered mRNA, altered protein
Intergenic region (98% of genome)
How do you find out what SNPs are?
Online database at NCBI - dbNCBI
rs number is a unique identifier given to each SNP.
The SNP will be shown with unique flanking sequence on either side.
What is minor allele frequency?
SNPs have two forms eg. [C/G]
C 0.557
G 0.433
The less common allele is called the “minor allele”.
Major allele frequency + minor allele frequency = 1
These can be used to work out the alleles of populations!
What is the significance of SNP Minor Allele frequency?
SNPs are chosen for genetic association studies on
the basis of their MAF
Common diseases are likely to be caused by common
variants
SNPs with MAF >0.05 (5%) are usually used in
association studies - GWAS
Exceptions are known monogenic disease SNPs
What is the Genome Wide Association Study (GWAS) ?
Recruit large numbers of cases and controls
Genotype markers across the whole genome
SNP Microarrays
Look for association between disease and alleles of each marker –
chi-squared test
Positive association is at p<5x10-8 (multiple testing correction)
What is a Manhattan Plot?
GWAS results are presented as a single graph called a
Manhattan plot
All results are plotted, typically for >1M SNPs
X-axis is position of the SNP on the chromosome
Y-axis is –log10(p-value) of the association
Green peaks indicate significant p-values. The peak of association does not identify the gene causing the disease. The peak identifies the genomic region associated with disease and this is usually smaller than 100kb.
What is Regional association plot?
Chromosome 1 position (x axis) against recombination rate (y axis) and -log10P (other side of y axis).
Diamond shaped plots narrowing down on the region where the problematic gene is.
What is Meta Analysis?
Difficult to do large studies (>1K cases/controls)
Easier to combine smaller studies
Pre-experiment – Consortium
Post-experiment – Meta-analysis
Meta-analysis allows the statistical combination of
results from multiple studies
Used to expand knowledge and data across the world.
What are some problems with GWAS?
GWAS has identified associations that are statistically
strong and reproducible
However, their contribution to the genetic component
of disease is estimated to be low (<5%)
Possible answers:
Many common SNPs of very small effect
Rare SNPs
Copy Number Variation
Epigenetic variation
What is linkage disequilibrium?
LD is when two alleles are inherited together more often than expected by
chance
This is usually because they are close together in the genome
Alleles that are physically close together are rarely separated by
recombination