W8L1 Thu from Gwas to phenotype Flashcards
rs8050136 SNPs marker
- Identified in a GWAS of 72,598 European ancestry individuals.
- p = 5 x 10-36
- Strong association and large effect on:
- BMI
- Weight
- T2D risk (comorbidity?)
- Risk allele explains ~1kg in someone 1.7m tall
- Located in an intron of the FTO gene
understanding manhattan plot
-if over the line, the result is significant
-y axis is significant and x is the position
-lead SNP is the most significant, the one that is normally reported on
rs8050136 and gwas
- One of the early success of the GWAS field!
- The logical assumption was that it impacted FTO expression levels somehow, since the phenotype made perfect sense.
- All that was missing was the actual mechanism.
problem of rs8050136
– variation in FTO expression could never be correlated to variation in genotypes at rs8050136
* rs8050136 was the wrong SNP… and FTO the wrong gene!
* rs9930506 – less significant, not an eQTL for FTO…
but an eQTL for nearby gene IRX3
What are the main challenges to decoding trait biology
- > 90% of GWAS hits fall outside protein-coding regions of genome
* How to link them to genes? - Relevant cell type is not always readily apparent from GWAS hits
* How to link them to cell types? - Most loci have small effect sizes on overall trait values
* How to link them to phenotypes?
Need to consider expression level
The human transcriptome
- Definitely a lot fewer protein-coding genes than we expected (19K) But transcriptionally (63K) and translationally rather complex… and tightly
regulated
The human genome
-protein coding gene is only 1.5% of the total genome
Large scale catalogues of expression and
regulation in humans
- GTEx: 17,382 RNA sequencing samples from 52 tissues from 838 postmortem donors
- eQTL meta-analyses (combining a bunch of studies together, harder than it sounds):
- eQTLs in whole blood across > 30,000 people
- ENCODE:
- Extensive regulatory characterisation of 6 cell types, less extensive in 147 cell lines
- Roadmap Epigenomics:
- Chromatin state across 111 primary cell types and 16 cultured cell types
What we learn from GTEx
-look at alot of eQTL and long noncoding in many tissue
-cis-sQTL for 66% of protein
-trans eQTL for 121 protein coding gene and 22lincRNA genes
eQTLs are overrepresented within GWAS hits
- look at over 200 study with GWAS
-64 study where eQTL gene is the causal gene
-51 study found eQTL but no gene is found
-29 study have eQTL found but causal gene not present
-104 did not find eQTL at locus
looking at gene regulation with GWAS
- Profiled multiple histone modifications across 127 human cell types, alongside whole genome sequences, RNA-seq, DNA-methylation and others
-use regulatory annotation to provide useful info for GWAS SNPs analysis
problem with variant , CF example
- CFTR is one of the longest genes in the genome.
- ~200,000 bp long
- Many mutations in CFTR lead to cystic fibrosis.
- Effect of some mutations is known, but most are classified as Variants of Unknown Significance
Variants of Unknown Significance
-dont know how a gene/mutation can cause the problem
* VUS are widespread even within protein coding regions, more so within non- coding…
* 4.6 million VUS in gnomAD
-50% of mutation variant that lead to problem
Testing VUS the old fashioned way
- Put a VUS into a plasmid w/ other reporter construct (GFP, luciferase, whatever)
- Compare the reporter activity of cells carrying each version of the plasmid, learn something.
- You will be dead well before you finish with all 4.6 million VUS
Multiplex Assays of Variant Effects (MAVE)
- MAVEs are a family of experiments designed to test the effect of many variants at once by using DNA barcode