L32 Functional Human Genetics 2 Flashcards
GWAS hits
see onenote
- Most GWAS hits aren’t causal but are in LD with causal variant
- majority of GWAS hits are located outside protein coding regions
- How do we prioritise each variant?
What does rs8050136 do?
see onenote
- strong association and large effect on BMI, weight, T2D
- located in intron of FTO gene
How do we know the genome works?
see onenote
- Protein code - triplets
- String of polypeptides
- But most of the genome is not protein-coding genes - only 1.5% codes for proteins but the rest aren’t junk DNA, they have a function
From genotype to function
see onenote slides
- functional genomics, deciphering how the genome actually works
transcriptomics
- Transcriptomics is also referred to as expression profiling, examines the expression level of RNAs in a given cell population
epigenomics
- Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome
Why do we measure RNA?
- Easy to measure at high throughput
- Close to DNA, which we understand fairly well
- But RNA may be far removed from the ultimate phenotype - us
Two main technologies today:
- Microarray
- RNA seq
Expression microarrays
see onenote
- Start with sample of RNA
- if RNA molecule matches to known DNA probe, it will bind
- 25bp DNA probes, we know the sequence of these probes
- The more RNA that binds, the brighter the signal
- Microarray is ultimately a fluorescent based technology
- But you can only go so bright, you lose a bit of definition
RNA-seq
see onenote
- Sequence every single of RNA in that sample
- Map reads back to the genome
- How many reads map to the 1st gene, 2nd gene etc.
- Doesn’t depend on having the right probe, depends on the gene being in your sample
Microarrays vs RNA-seq
- Microarrray
○ Shallow, fast, cheap overviews of expression across many individuals
○ Straightforward
○ Relies on known DNA probes, can miss genes that don’t have probes present - RNA seq
○ Unbiased, can be used to uncover new biology
○ Quantitative, more precision
What can we learn from gene expression levels?
see onenote
- Cell-type specific, foundation of tissue specificity
- Look at same tissue between healthy people vs patients
- Tissues within people
- Tissues within people across time
The human transcriptome
- Fewer protein coding genes than we expected
- but transcriptionally complex, complexity must be somehow encoded in our DNA but how?
Large-scale quantification of human transcription: GTEx
see onenote
Genotype-tissue expression project
- RNA-seq on 40+ human tissues across 400+ (deceased, healthy) humans
- capture both intra- and inter-individual variation
- Main lesson: human tissues are transcriptionally complex and fairly distinct
- Differences between tissues don’t depend on a lot of genes
- Specificity is encoded by a small number of genes in comparison to house keeping genes which are required is most cells and tissues
- transcription can vary in association with age, sex, ethnicity etc.
- Good reference source for QTL identification and GWAS annotation
Quantitative trait loci can impact expression levels
QTL - genetic region associated with a particular quantitative phenotype
- by this definition, all GWAS hits are QTLs
- using RNA-seq we can measure the impact of individual SNPs on nearby gene expression levels, knows as eQTLs
How does eQTL impact expression levels?
see onenote
biological process of going from DNA to RNA to protein can be regulated at many different levels
- eQTL testing requires both genotype and expression measurements in the same individual
- Gene specific
- When, where, how long, how many transcripts are made, how quickly are they made
Post-transcriptional regulation
- effects stability and localisation of transcripts
Identifying eQTLs brings us one step closer to understanding the cellular mechanism that underlie differences between individuals
Gene regulation simultaneously occurs at multiple levels
see onenote
- DNA methylation
- chromatin modifications
- DNAse 1 hypersensitive sites
- TF binding sites
- long range regulatory elements
- promoter architecture
- protein-coding and non-coding transcripts
Chromatin accessibility
see onenote
- DNA coiling around histones => nucleosomes
- euchromatin
- heterochromatin, inaccessible
- accessibility impacts whether a gene can be transcribed
Histone modification
see onenote
- Histone modification of histone tails, different modifications can impact gene expression and chromatin structure
- different modifications mean different things; some repress, activate, enhance
TF binding
see onenote
- for transcription to take place, a complex of proteins must interact with DNA e.g. GTF (RNA pol 2), TF (activators, repressors)
- TFs tend to act additively or combinatorically
- TFs can act locally or at a distance (promoters vs enhancers)
- many TFs have preferred sequences they bind with high affinity
ENCODE
see onenote slides
- cataloguing regulatory diversity
- RNA seq, ChIP seq, Dnase seq, bisulfite seq
DNase-1 hypersensitivity
see onenote slides
- DNase-1 will cleave DNA not bound around nucleosomes, DNAse-1 hypersensitive sites (DHS); good markers of open chromatin
- 1 million of these sites are cell-type specific, much chromatin structure varies by cell type - they are context dependent
- If DHS overlaps TF binding sites - Dnase won’t cut if DNA is bound to TF (not just if it’s bound to nucleosomes)
- Can show whether TF is binding to TF binding site in a particular individual in comparison to another individual e.g. if T changes to C => TF doesn’t bind, loses gene expression
What is function?
- no widespread definition of function
- just because something occurs does not mean it has function
- just becomes some occurs often does not mean it always has function
Beyond ENCODE: roadmap epigenomics
- followup effort to characterise regulatory variation and mechanisms
- profiled a lot of things across a lot of samples with a big focus on histone modifications
Epigenetics
mean different things to different people
Our definition:
set of chemical modifications on DNA of a cell that are associated with changes in gene expression
Regulating expression through fine-tuning chromatin
see onenote
- identified 15 distinct chromatin states beyond traditional open/closed heterochromatin dichotomy
Insights into gene regulation
- regulation of gene expression involves interactions between multiple cellular mechanisms, sometimes at great physical distances from one another
- still mostly correlation, not causation
rs8050136
- one of the early successes
- assumed it impacted FTO expression levels somehow since the phenotype made perfect sense
FTO impacts BMI, regulatory variants within FTO impact BMI
see onenote
problem = variation in FTO expression could never be correlated to variation in genotypes at regulatory region identified by GWAS
rs8050136 turned out to the wrong SNP
rs9930506
- less significant, also not an eQTL for FTO but an eQTL for a nearby gene IRX3 which also impacts BMI