12- Association Analysis Flashcards
define genetic association
the presence of an allele at a higher frequency in unrelated subjects with a particular trait, compared to those that don’t have this trait
e.g. patients with a disease sharing an allele more than controls without the disease
how is a genetic association study conducted?
conducted as a case-control study
cases have the disease trait, are well-defined based on consistent disease criteria
controls don’t have the disease, as well-matched as possible to cases for non-disease traits and risk factors = e.g. sex, age, location
requirements:
- large number of well-defined cases and well-matched controls
- reliable genotyping technology = SP microarrays
- statistical analysis tools - e.g. PLINK - to analyse genetic data
method:
1. genetic loci of interest measured in cases and controls = involves SNP microarrays to identify genetic variations
2. statistical analysis methods - e.g. PLINK - to identify genetic loci likely associated with the disease
3. positive associations between specific gene loci and the disease are identified, conduct replication studies to validate results
describe an ideal genetic marker and how SNPs fit this description
genetic marker = alleles that we can genotype to assess if they’re associated with a disease
ideal genetic marker:
- polymorphic = multiple alleles/variants within a population, to detect genetic diversity
- high allelic frequency in population
- in linkage disequilibrium with the causal variant associated with disease
- stable with time and across populations, always be the same, reproducible
- easy to assay and genotype
SNPs fit this description:
- have multiple variants, stable inheritance, abundant
- SNP microarrays available for genotyping integrated in large public databases making statistical analysis possible
- linkage disequilibrium possible if in close proximity to causal variant
- many disease traits are associated with SNPs
- varying allelic frequencies measured by MAF
what is a SNP? how does it form? how is SNP frequency measured?
SNP - a single nucleotide base change at a specific gene loci/position
- can occur in coding, non-coding and intergenic regions = cause coding synonymous, non-synonymous or missense variants
- at splice sites, promoters and terminators = affect mRNA and protein
mechanism: wrong base incorporated during DNA synthesis causing a mismatch. detected by repair mechanisms but the wrong base is corrected. still end up with a standard Watson-Crick pair
SNP is inherited and passed on if it isn’t deleterious and occurs in gametes
measure SNP frequency in a population through minor allele frequency - lesser common allele is the minor allele
the principles of a genome-wide association study (GWAS)
- clearly defining the trait of interest - e.g. disease, measurable characteristic
- recruiting a large, well-matched equal sample of cases and controls, quality control to ensure accuracy and reliability
- genotyping using high-throughput techniques - e.g. SNP microarrays, genotype genetic SNP markers across the entire genome, and understand the genetic variation in sample variation
- statistical analysis for associations between disease and SNP genetic marker alleles = chi-squared test
- genome wide significance threshold of p values is p < 5x10^-8 as lots of markers are tested
= positive association at p < 5x10^-8
can see the magnitude of difference between case and control frequencies by looking at the p value – lower p value means more significant
what does the chi-squared test do?
examines whether two categorical variables - the genetic marker and causal variant - are independent in influencing the test statistic (the disease/phenotype of interest)
what is a Manhattan plot?
= a type of scatter plot that displace high-magnitude values, represents P values of entire GWAS
X axis shows position of the SNP on a chromosome
Y axis is the log value of association, shown through peaks
higher peaks are significant P values = identifies the genomic region associated with disease, can include more than one gene
what is a regional association plot?
= enlarged Manhattan lot, can look at a pre-defined genomic area
red coloured peak/point is the most significant SNP - can be between two genes
describe meta-analysis and how it is used in genetic studies
meta-analysis allows for the statistical combination of data from multiple studies post-experiment
increases statistical power and precision in estimating the association between genetic variations and a specific disease/trait
involves statistical analysis and quality assessment
describe the known problems with GWAS
many common SNPs associated with small effect sizes - only contribute moderately to the overall disease
GWAS only focuses on common SNP variants, rare variants and CNVs may contribute more to a disease
disease are complex = often involve interactions between various genes and environmental factors = GWAS focuses on individual genetic markers
limited coverage
describe the relationship between genetic association and linkage disequilibrium
genetic association is the statistical association between a genetic variant/SNP and a trait/disease in a population
linkage disequilibrium is the non-random association of alleles at different loci on a chromosomes that are inherited together frequently
genetic association studies rely on linkage disequilibrium - identifying a genetic marker in linkage disequilibrium with the causal variant can allow the identification of associated regions, and help identify genomic regions of interest
describe haplotype analysis - what is it, haplotypes as genetic markers, haplotype analysis in association studies, clinical uses?
haplotype = set of genetic markers - e.g. SNPs/ genetic variants - on a single chromosome, often inherited together from one parent
haplotype analysis = studying the combination of alleles present on one chromosome that tend to be inherited together
important in understanding patterns of genetic variation in a population, study association between certain genetic variants and diseases
- associated with linkage disequilibrium = the non-random association of alleles at a different loci on a chromosome, often close together and are thus inherited together more frequently than by chance
- SNPs used as genetic markers for haplotype analysis = examine SNP combinations on a chromosome to identify haplotypes
- haplotype analysis in association studies = identify associations between specific haplotypes and phenotypic traits/ diseases. studying the frequency of haplotypes in affected vs unaffected people, can identify genetic factors associated with a trait/ disease
- clinically used to identify genetic risk factors for disease, predict med responses, understand individual variations in drug metabolism