Week 5 Flashcards
With the depression study comparing QQ plots revealed that the SNPs are there and they are each having a tiny effect, but not reaching genome-wide significance. How can we get at these SNPs and see if they make a real contribution?
We can use polygenic risk scoring (although this tends to be used when genome wide significance is found it can be used when non significant associations are found)
Who carried out the large GWAS study into schizophrenia in 2014? And how big was the sample and how many hits were reported?
Ripke et al., (2014)
35000 participants
108 hits
What is polygenic risk scoring?
Using SNPs that are thought to be linked with a phenotype
Genotype some individuals for the SNPs
And then add their number of the risk alleles to see how much at risk they are.
E.g.
SNP 1 (A is the risk allele)
Person 1 has AA (2)
2 has TT (0)
3 has TA (1)
What can you do if the SNPs investigated using polygenic risk scoring have different effect sizes?
You can use the odds ratio to calculate the accurate polygenic risk scores
Explain the steps for polygenic risk scoring (5 steps) when using genome-wide significant hits
- Do a GWAS or get the results from one
- Select a set of SNPs based on the p value
- Do a GWAS in an independent sample
- Calculate the polygenic risk score
- Test the polygenic risk score as a predictor of an outcome
When calculating a polygenic risk score using hits that did not reach genome-wide significance what p value should be used?
There is no answer… Yet and it is likely to differ from different disorders
Try using multiple thresholds
E.g may be p < 0.001, p < 0.01 etc
What has linkage disequilibrium have to do with choosing the number of SNPs to use when calculating a polygenic risk score?
Close by SNPs are often correlated with one another (and are inherited together via linkage disequilibrium)
There may be whole climbs of SNPs that are associated with depression (example)
But this could be driven by one SNP.
We could inflate out scores by counting each one
Therefore pruning based on linkage disequilibrium is crucial.
When running a GWAS in an independent sample what is the population called?
Target population
What should the target population be in an independent GWAS sample for calculating a polygenic risk score?
It should be as similar as possible to the discovery data
Oh no! Your precious GWAS study used an old microarray chip to genotype the discovery data in your polygenic risk score analysis. What can you do?
You can impute any SNPs not genotyped in the target population
Imputation takes advantages of linkage disequilibrium to fill in the gaps in our sample.
What program can solve array mismatch in a discovery and target populations in a polygenic risk score analysis?
Programs such as impute
It will also give you a level of certainty of the imputation.
How do we calculate the individual polygenic risk score?
Number of risk alleles weighted by effect size
Score = SUM (risk alleles * log odds)
How can we test the polygenic risk score as a predictor of outcome?
We can do a logistic regression with cases and controls
- remembering to covary for population stratification
This means we can get measures of the variance explained in terms of r2
List four other things that can be done with polygenic risk scoring?
Testing the generalist gene hypothesis
Cross trait/disorder analyses
Gene-environment correlations
Gene-environment interactions
Using polygenic risk scoring how much of the variance was managed to be explained in depression?
About 1.4%!