Association Analysis Flashcards
What is genetic association?
→ The presence of a variant allele at a higher frequency in unrelated subjects with a particular disease cases compared to those that do not have the disease
What is a haplotype?
→ The order of alleles along a chromosome
What is an allele?
→ One form of a variant in the genome
What is a locus?
→ a position in the genome
What is a genotype?
→ both alleles at a locus
What is a case control study?
→ Case group who all have the disease
→ Controls that match the people with the disease for non-disease traits such as ; age, location, ethnicity
What are the requirements for a good case control study?
→ Large numbers of well defined cases
→ Equal numbers of matched controls
→ Reliable genotyping technology
→ Standard statistical analysis
What extra step do you need to do when you find a positive association?
→ Replicate
→ To prove that it is not by chance
What are characteristics of an ideal genetic marker?
→ Polymorphic
→ Randomly distributed across the genome
→ Fixed location in the genome → Frequent in genome → Frequent in population → Stable with time → Easy to assay (genotype)
How often are SNPs found in the genome?
→ 1 in every 300 nucleotides
How many SNPs have been identified in the genome?
→ 12 million
How are SNPs formed?
→ The repair mechanism inserts a matching nucleotide to the wrong base so it is different from the original pair
What is the effect of SNPs found in the coding region?
→ Coding region
→ no amino acid change
→ amino acid change
→ new stop codon
Where can SNPs be found in the non coding region and what is the effect?
→ Promoter - mRNA and protein level changed
→ Terminator - mRNA and protein level changed
→ Splice site - altered mRNA, altered protein
What do the major and minor allele frequency add up to?
→ 1
What is a GWAS?
→ genome wide association study
→ association between disease and alleles of each marker - chi squared test
How is GWAS data represented?
→ a single graph called the Manhattan plot
What are the axes on a Manhattan plot?
→ X axis is the position of the SNP on the chromosome
→ Y axis is the -log10 (P value) of the association - done by chi squared
What does a peak on the Manhattan plot signify?
→ The peak does not identify the gene causing the disease
→ It identifies the genomic regions associated with disease and is smaller than 100kb
Why is the scale -log10 on the manhattan plot?
→ The probability of a result being due to chance is very high because there are so many samples
→To produce a linear graph
What is meta analysis?
→ Allows the statistical combination of results from multiple studies
Increases statistical power
What % of body shape is genetically determined in twin studies?
→ 70-80%
What is the gene associated with obesity?
→ FTO
What is the minimum accepted p-value for GWA significance?
p<5x10-8
What is the problem with GWAS?
→contribution to the genetic component of disease is estimated to be low (<5%)
Why is GWAS contribution to genetic diseases low?
→Many common SNPs of very small effect
→Rare SNPs
→Copy Number Variation
→Epigenetic variation
In what two ways can you combine smaller studies?
→Pre-experiment – Consortium
→Post-experiment – Meta-analysis
What is the rs number?
a unique identifier given to each SNP
What does Association mean?
<100kb from a causal variant