Transcriptome Analysis Flashcards
Why do we care about transcription?
It is the primary means of interpreting info in the genome
it plays a central role in evolution
Often misrelated in disease
Complex traits
- > 85% of GWAS associations lie in non-coding regions
- enriched for eQTLs, overlap with regulatory elements
Basic principles of gene regulation
Gene expression varies in quantity, space, time, and in response to stimuli
We typically measure steady state RNA
RNA is regulated at the level of transcription, promoter usage, splicing, poly A site usage, stability, and localization
Perspectives to study RNA
spatial localization
abundance quantification
transcript isoforms and structure
emphasis on response to stimuli
Spatial localization of RNA
Techniques: In situ hybridization, immuno histochemistry, gene fusions
Can provide very precise (sub)cellular resolution
Often on fixed tissues, but live imaging becoming more common
often difficult to quantify due to technical variations
immuno histochemistry
treat tissue with antibodies
Quantifying RNA abundance
Technqiues: Northern blots, qPCR, microarrays, nano string, RNAseq
Isolate cells, extract RNA, measure steady state RNA
Isolating cells can be difficult
transcript isoform usage and structure
techniques: qPCR, nanostring, Long read or paired end RNA seq
microarrays were not particularly good for this
short read RNAseq data has inherent limitations
Response to stimulus
Peturb system, measure gene expression
-knock down TF + measure RNA
Knock down TF and measure RNA
how do you know if change is direct?
pulse chase experiment
method based on pulse chase experiment
nascent transcription quantification (GROseq)
measuring RNA stability
EST Sanger sequencing
which is great for gene identification and characterization, long reads enabled isoform reconstruction, too expensive to accurate quantification
SAGE
Serial Analysis of Gene Expression
cDNAs cleaved into short <20bp fragments, concatamerized, and sequenced
RNAseq molecular biology
extract RNA purify RNAs of interest (mRNA, miRNA) fragment, prime convert to cDNA attach adapters sequence single or paired end reads
RNAseq analysis outline
In some applications, reads are aligned to transcriptome (some align to transcriptome)
Assemble and quantify transcript abundance
test for differential expression(data are count based)
RNAseq complications
Alignment –>short reads, large gaps, 1% error
- using annoyed gene models helps
- paired end and longer reads help
Experimental design - replicates –>bc the experiment is fairly expensive and complicated, many people do not perform (enough) replicates
Confounding variables:
-randomization is critical in experimental design
small n, large p
empirical bayes approaches
Computation cost
Tophat ~1 hr/ 1 M reads on standard workstation
Confounding variables:
difficult to control for variables can have large effects on RNAseq data
RNA extraction data, person performing library construction, kit batch, sequencing run, temperature, time day…
hidden variables
latent variable techniques: PCA, factor analysis, PEER, SVA
FPKM
Fragments per kilobase per million mapped reads
standard unit of measurement for RNA abundance from RNAseq
normalizes by transcript length and read depth
relative measure- depends upon the abundance of all transcripts,
Surrogate variable analysis
ranking features of association accounting for hidden variables that are unmeasured