Lecture 14: GWAS 2 Flashcards
What are the potential sources of bias for GWAS?
Multiple testing, ill-defined sample size, population stratification, choice of case and controls, merging datasets and the need for genotype imputation
Explain the issues surrounding multiple testing and the different corrections used. What is the point of doing this?
p value of 0.05 is usually used but because thousands of SNPs are tested the p value needs to be reduced as there would be lots of false positives. The bonferoni correction is where you divide 0.05 by the number of tests you are doing to get a new p value. The Benjamini Hochberg correction is where you arrange all the SNPs from smallest to largest p value and assess significance of each of them by doing the Bonferoni (n = 10 eg) and dividing the first by 10, second by 9 etc. Trying to maximise the chances of finding something whilst minimising the false positive rate.
What is two stage GWAS used for and what is it?
the way to work around small sample sizes. First scan with a moderate sample size to identify areas of interest which may not reach statistical significance. Areas of interest are genotyped or sequenced in an independent data set to confirm associations.
What is the problem of population stratification?
People associated with different regions are coming back as being highly associated with phenotypes. There is a possible correspondance between genetic and geographic distances
What geographical considerations have to be taken into account?
Some SNPs may have nothing to do with a particular disease but rather act as markers of an individual’s origin
What has to be matched properly? What is the consequence if they are not?
Cases and controls. There can be spurious associations.
What can it be difficult to measure?
Some phenotypes eg mental health disorders.
What is genotype imputation?
The process of predicting or imputing genotypes that are not directly assayed in a sample of individuals.
What is missing heritability? Give an example of this. What does this mean?
Most SNPs linked to a condition have low predicitng power with SNPs explaining less than 1% of variance. All SNPs identified and put together explain only a fraction of the heritable component. Breast cancer has a 30% genetic component but only 80% of this is explained by GWAS. There are additional things that we aren’t testing for
What can happen with associated SNPs?
They might be in linkage disequilibrium with the SNP actually causing the disease but which hasn’t been tested.
What are the alternatives to GWAS?
Copy number variations can be associated with disease though don’t explain the missing heritability.
Sequencing of extra genomes eg one project sequenced 5000 genomes.
What is the 10,000 genomes project?
full sequence of 4000 people from twins UK and ALSPAC of people with extreme obesity, neurodevelopmental disease and other conditions.
Explain what exome sequencing is?
Involves sequencing exons from a whole genome. The DNA is shredded and segments containing exons are captured with probes. The exons are sequenced using next generation sequencing and align against the reference genome
What is good about exon sequencing?
Allows identification of very rare variants, can try to identify changes in copy number variation. Can discard the indels present in most of the population as are unlikely to be the cause of disease.
How was exome sequencing involved with miller syndrome?
First time it was used to identify phenotype associated mutations. Took 4 affected individuals from 3 different families and sequenced the coding regions. Miller syndrome was linked to mutations in DHODH gene. They sequenced this gene in 3 unrelated patients and they all carried mutations in it.