Lecture 16 - human genetics Flashcards
What are different possible architecture of complex diseases?
- small of dominant alleles confer a large increase in risk
- common disease, common variant model - many alleles confer a small increase in risk
- intermediate - one major allele exerts a large effect, numerous other lower risk alleles
What are single nucleotide polymorphism (SNPs)?
- between 11-15 million common SNPs (Minor Allele Frequency >5%)
- uneven distribution of SNPs in the genome
Where do SNPs occur at?
- coding regions
- non-coding regions
What are coding regions?
- synonymous (no change in encoded amino acid)
- non-synonymous (e.g. missense or nonsense mutation)
What are non-coding regions?
- can affect expression/regulation of associated genes
- complex diseases arise from combinations of multiple SNPs
What is the exome aggregation consortium (ExAC)?
- exomes from unrelated individuals sequences as part of various disease-specific and population genetic studies
- 7.4 million variants mapped
- records frequency of alleles in a population
- records frequency of alleles in a population
- documents rare mutations
- highly pathogenic variants seen with a lower frequency in the general population
- gnomAD aggregates over 125,000 exome and 15,000 genome datasets
What are genome wide association studies?
Population-based studies looking at individuals with a condition against a control population
- examine a panel of SNPs in the genome for association with the disease phenotype
- search for alleles that occur more frequently in disease cases than in matched controls
- requires many participants
- GWAS studies have been performed for most common diseases
- many risk loci have yet to be identified
- missing loci contribute to the ‘missing heritability problem’
What did the genome wide association studies do?
- GWAS compare the allelic frequency across the entire genome in case and control populations
- significant differences in allelic frequency constitutes an association with disease
How are SNPs associated with disease?
- association studies can tell us if an allele is associated with a disease
- the SNP itself
- the SNP correlates with the risk allele due to linkage disequilibrium
What is linkage disequilibrium?
the non-random association of alleles at different genomic sites
What does linkage disequilibrium depend on?
- distance between alleles
- recombination rate
What can patterns of Linkage Equilibrium be summarised as?
Haplotype blocks
What are haplotype blocks?
regions of high linkage disequilibrium that are separated from other haplotype blocks by many historical recombination
What occurs in haplotype mapping?
groups of alleles are clustered so a single SNP can identify the cluster of alleles (Tag SNP)
How does identification of risk alleles occur?
- GWAS studies identify SNPs associated with disease, not necessarily risk alleles
- need integration with functional data on candidate regions to identify causality
How are alleles associated to the disease?
- the likelihood of an SNP being associated with a disease is measured in an odds ratio (OR)
What is an odds ratio (OR)?
is a statistic that quantifies the strength of the association between 2 events:
OR = 1 events are independent
OR > 1 events are correlated
OR < 1 events are negatively correlated
Common disease common variant (CDCV) model of complex disease - multiple alleles with OR<1.2 showing weak association to the disease phenotype
Explain how GWAS relies on statistical analysis & large cohorts
- statistical significance is needed to differentiate true positives from false positives
- genome wide significance is where p value <5 x 10^-8
- 1 in 20 events are non-significant (nominal significance = 0.05)
- for 1 million SNPs expect 50,000 false positives
- very large number of participants are required
How are risk variants defined?
- Manhattan plot
- Threshold for significance is shown by the red horizontal line
- 44 risk loci defined with significant p-values (green stacks)
- GWAS susceptible to high number of false negatives
Describe type 2 diabetes
- common chronic condition caused by an inability to take up sugar
- characterised by high blood sugar, insulin resistance and a lack of insulin production
- diabetes is multifactorial (genetic & environmental)
- familial
- geography & ethnicity
- age, weight, diet level of physical activity
Describe the GWAS of type 2 diabetes
- previously identified common risk alleles (Red SNPs)
- novel association loci determined due to increased statistical power (green SNPs)
- these novel loci have low Odds ratio (1.06-1.27) with each causing only a small increase in risk
What are the 3 alleles identified by the GWAS of type 2 diabetes?
- TCF7L2
- FTO
- CDKN2A/B
What does TCF7FL2 do?
- the alleles providing the greatest risk of type 2 diabetes
- intronic variant
- transcription factor required for pancreatic development
What does FTO do?
- intronic variant
- involved in body weight regulation