genetics of common disease Flashcards
Variant frequency in cases vs controls
Genetic variants, and those which are tightly linked to their region of the chromosome, are present at higher frequency in cases compared to controls.
Mendelian disease
monogenic
clear inheritance pattern
minimal environmental influence
does not apply to common diseases or most phenotypic traits (e.g. height, high blood pressure, heart rate)
Common disease
multifactorial disease:
- multiple genes affect the disease/trait, with effect of each gene variant being very small/negligible
- strong influence from environment
E.g.:
- type II diabetes
- hypertension
- Alzheimer disease
Heritability
measure of how well difference in people’s genes account for differences in their traits
Heritability close to 1 indicates…
almost all of the variability in a trait comes from genetic differences, with very little contribution from environmental factors
e.g. Cystic Fibrosis (heritability is 1)
How can we calculate heritability?
Using twin studies:
>Monozygotic twins have 100% DNA
>Dizygotic twins have 50% DNA
>Both share same environment, therefore any difference would be due to disease/trait
Interpreting Twin Studies
When looking at a trait e.g. height, measure height in both sets of twins and you would see their concordance. The higher the concordance, the more similar they are going to be. The more the trait is determined by a genetic contribution, the greater the difference in concordance because monozygotic twins share 100% DNA, whereas dizygotic twins share 50% DNA.
Once we’ve done our heritability study, we then need to identify which genes contribute to that trait.
How can we find out which genes contribute to a trait/disease?
through linkage genetic association in mendelian diseases
-GWAS (genome wide association study)
GWAS (genome wide association study)
a method for identifying gene variants (SNPs) involved in complex diseases by using genetic markers scored for hundreds or thousands of individuals who have the disease (cases) and who do not have the disease (controls)
a typical GWAS study collects data to find out the common variants in a number of individuals, both with and without a common trait/disease, across the genome, using genome wide SNP arrays
SNP Microarray
1) DNA Sample prepared and fragmented
2) DNA tagged/labelled with fluorescent probe
3) Mix DNA with the slide, which contains oligonucleotides which match the region of the genome around each variant being tested
4) If DNA sample contains a variant, then it binds to specific matching oligonucleotide and fluoresces
5) Signal produced which can be detected
Encoding SNP Chip data for analysis
After we have our data from SNP Chip, it gets converted in a computer to a code. It works out what the genotype of individuals are and then converts them to a binary code that computer programs can deal with.
SNP Chips
Rather than directly measuring genotypes at all genetic polymorphisms, we rely on the association between SNPs we do assay and SNPs we don’t assay
SNP-SNP association, or linkage disequilibrium (LD) is fundamental to our ability to sample the whole genome with relatively few SNPs
Linkage disequilibrium (LD)
non-random association of alleles at two or more loci in a general population
linkage disequilibrium between two SNPs decreases with physical distance as more likely to have recombination between them
If LD is strong (chance of variants in that region inherited together is high), fewer SNPs are needed to capture variation in that region, therefore cheaper and easier/quicker to analyse
Where does most of the common variation occur?
most of the common variation occurs in the non-coding regions and often the causal variant is not included on the SNP chips so further work is required to narrow down the region of association and identify the causal variant
What analysis is carried out to indicate how likely a variant is to be associated with a trait?
statistical analysis (p-value indicates the significance of the association)
-lower p-value= more significant
all of the p-values for all of the SNPs on the chip are then plotted on a Manhattan plot
high peaks = high significance between a gene region and trait