Association analysis Flashcards
What is genetic association?
Genetic association is the presence of a variant allele at a higher frequency in unreliable subjects with a particular disease (cases), compared to those that do not have the disease
Recap terms
Allele, locus, genotype and haplotype
Allele = one form of a variant in the genome
Locus = position in the genome
Genotype = genetic makeup which gives rise to the phenotype, both alleles at a locus
Haplotype = two alleles which are inherited together from a single parent
Describe case control studies?
What are cases?
- Cases are subjects with the disease of interest, e.g. obesity, schizophrenia, hypertension
- Definition of the disease must be applied in a rigorous and consistent way
- Controls must be as well-matched as possible for non-disease traits
- Such as age, sex, ethnicity, location, etc.
How do you use case control studies for genetic association?
- Match the affected cases and unaffected controls for all the other risk factors
- Measure the genetic loci of interest
- Statistical analysis to determine which genetic loci correlate with disease
- Identify genomic region associated with disease
What do the best case control genetic studies have?
Have:
- Large number of well designed cases
- Equal numbers of matched controls
- Reliable genotyping technology(SNP array)
- Standard statistical analysis(PLINK)
- Positive results must be replicated
Why do we need many genetic markers?
What are ideal characteristics of genetic markers?
- Individuals in a population are genetically far more diverse than individuals in a single family
To capture this genetic diversity we use 100,000s or millions of genetic markers
THE IDEAL GENETIC MARKER
- Polymorphic
- Randomly distributed across the genome
- Fixed location in genome
- Frequent in genome
- Frequent in population
- Stable with time
- Easy to assay (genotype)
What is dbSNPs?
Database of SNPs
What are on the either sides of SNPs?
There’re unique flanking sequences on either side of the SNP
What is the rs number?
The rs number is a unique identifier given to each SNP
What do GWAS use and give an example?
- Use markers across the whole genome
- SNP Microarrays
- Look for association between disease and each marker – chi-squared test
- This has resulted in the detection of large numbers of disease-associated genes
In what way is GWAS data presented as?
Presented as a single graph called a Manhattan plot
The higher the peak the greater the association with the trait/
However NOTE - the peak identifies the genomic region associated with disease NOT the actual gene causing disease
What do the x and y axis in a GWAS results plot represent?
- GWAS data is presented as a single graph called a Manhattan plot
- X-axis is position of the SNP on the chromosome
- Y-axis is –log10(p-value) of the association
- if p=10-9 then –log10(p-value)=9
What does meta analysis allow and why is it easier?
- Difficult to do very large studies (>10K cases)
- Easier to combine smaller studies
- Pre-experiment – Consortium
- Post-experiment - Meta-analysis
- Meta-analysis allows for the statistical combination of results from multiple studies
What are the problems with GWAS?
- GWAS has identified associations that are statistically strong and reproductive
- However contribution to the genetic component of disease is estimated to be low due to:
- Epigenetic variation
- Heritability is overestimated
- Common SNPs of small effect -Rare SNPs
- Copy number variation
- However contribution to the genetic component of disease is estimated to be low due to:
What proof is there that obesity is highly genetic?
What gene has been found in relation to obesity?
- Twin studies determine that 70-80% of body shape is genetically determined
- But adoptive studies determine 30-40%
- In family studies obesity is 40-60% genetically determined
- These studies all conclude that genetic factors play and important role in obesity
- The FTO gene has been found as of significance in obesity GWAS