Module 8.2 RNA Sequencing Flashcards
RNA-Seq
- presence and quantity of RNA
molecules in a biological sample - gene expression in sample
- Alternative gene spliced transcripts, exon-intron boundaries
- Post-transcriptional modifications
- Genetic variants such as fusions, indels, SNVs
- Changes or differences in gene expression levels or patterns
- Different populations of RNA subtypes
RNA sequencing methods
types (2)
- direct sequencing
- indirect sequencing (cDNA)
Indirect RNA sequencing
process overview (6)
- Isolate RNA from samples
- Fragment DNA into short segments
- convert RNA fragments into cDNA
- Ligate sequencing adapters and amplify
- Perform NGS sequencing
- Map sequencing reads to transcriptome/genome
indirect RNA sequencing
benefits and drawbacks
Benefits
- can be used for measuring RNA abundance or assembly
- widely used and enables functional dissection of complex transcripts
Drawbacks
- can’t tell which strand RNA was transcribed from
- when fragmented, lose poly a tail information at 3’ end in most mRNA fragments
- requires relatively large quantity of input RNA
- often result in under representation of 5’ end sequence of RNA
Indirect sequencing
Isolate RNA from samples
targeted sequencing methods
- affinity chromatography
- gel electrophoresis
- enzyme depletion
- targeting enrichment (eg. polyA library formation, size selection for miRNA, rRNA removal)
Indirect RNA sequencing
Convert RNA fragments into cDNA
reverse transcription
- aka first strand DNA synthesis
- PolyT primer to bind poly A tail
- gene specific primers
- random hexamer primers for total RNA
- 5’ end of mRNA may be underrepresented in cDNA if transcript is long and fragmented
- oligo dT primers + random hexamer = more full length cDNA and less 5’ end bias
retrovirus
- virus that uses RNA as genomic material
- use reverse transcriptase to convert viral RNA genome into complementary DNA molecule which integrates into whole cell’s genome.
- cell can produce more retrovirus that infects other cells
reverse transcriptase
features
- synthesize DNA from RNA template
- polymerase and nuclease active sites
- retroviruses, prokaryotes, and eukaryotes
- used to extend telomeres in eukaryotes
second strand synthesis
methods (3)
-
hairpin primed synthesis
- cDNA 3’ hairpin to prime second strand
- NaOH removes RNA, S1 nuclease removes loop
- priming is random and hairpin hydrolysis step may lead to loss of information -
RNase H activity
- Nick RNA with RNase H. Nicks act as primers for 2nd DNA strand
- RNA degraded and nicks of synthesized DNA products ligated via DNA ligase -
Random hexamers
- NaOH removes RNA
- prime and synthesize 2nd strand of DNA
strand specific seq
UDG
- Uracil DNA glycosylase
- recognizes and removes dUTP from DNA molecule
- USER- uracil-specific excision reagent
dUTP
features
- deoxyuridine phosphate.
- very similar structure to dTTP
- With certain DNA polymerases, can be easily incorporated into genome by pairing with dATP
Strand-specific RNA-seq
dUTP method
process
- mRNA nicked with RNase H, dUTP incorporated into 2nd strand instead of dTTP with DNA polymerase
- double-stranded cDNA molecules ligated with double stranded Y sequencing adapters
- 2nd cDNA strand removed when UDG removes dUTPs, so only first strand of cDNA with two adapters are PCR amplified
- Read 1= transcription direction, Read 2 = opposite direction
strand-specific seq
Directly Ligate Sequencing Adapters to the First-Strand cDNA
(DLAF)
process (5)
- double-stranded adapters ligate to single-stranded cDNA via random nucleotide overhangs
- left adapter: blocked 3’ end, 5’ phosphate -> ligate to 3’ end of cDNA
- right adapter: 3’ OH group, no 5’ phosphate -> ligate to 5’ end of cDNA
- top strand of adapter contains dUTP bases removed by UDG enzyme
- only strand with left and right adapters (reads 1 and 2) amplified and sequenced
Template Switching Technology
- terminal transferase activity of MMLV reverse transcriptase adds additional nucleotides (mostly dC) to 3’ end of newly synthesized cDNA strand
- template switching oligo (TS Oligo) has 3 riboguanosines at its 3’ end and universal adapter sequence at 5’ end
- Upon base pairing between TS Oligo and 3’ dC stretch, reverse transcriptase “switches” template strands and continues replication to 5’ end of TS Oligo
- resulting cDNA contains complete transcript and universal adapters
- reduces number of steps -> less sample loss during library preparation
- shown promise in generating full length cDNA libraries even for single cell derived RNA samples
Factors that influence read count
3
- Sequencing depth
- Transcript length
- RNA composition
Read count infuences
Sequencing depth
- total number of reads for each sample
- deeper sequencing = higher read count
- need to remove effect of different read numbers between samples
Read count influences
Gene/Transcript length
- a long transcript will have more reads mapping to it compared to a short gene of similar expression
- cannot directly compare the read count for two different transcripts of different size
Read count influences
RNA composition
- Read counts measures relative abundance of transcripts within sample
- Highly abundant transcript may take over significant portion of reads
- few highly expressed genes may contribute to very large part of sequenced reads in experiment, leaving only a few reads to be distributed among remaining genes
- Removal of overexpressed transcription (eg. subtraction hybridization for rRNA) enables detection of transcripts at low level
RNA-Seq
Other sources of technical variation
types (4)
- GC bias
- Library preparation methods
- Sequencing batches
- Experimental designs
Differential gene expression analysis
- analysis and interpretation of differences in abundance of gene transcripts within a transcriptome according to phenotypes or experimental conditions
- RNA-seq measures gene expression level by read count distribution
- gene is declared differentially expressed if an observed difference or change in read counts between two experimental conditions is statistically significant
- Volcano plots and heat maps
- may guide discovery of biomarkers, therapy targets, and treatment decisions
RNA seq applications
fusion and translocation variant detection
features
- Gene fusion often result in chimeric transcripts where exons from two different genes are joined together
- fusion events usually in intron regions
- identifies reads spanning through two exons joining together
- more efficient and information is valuable for understanding functional consequences of gene fusion and potential role in disease
RNA isoform detection
- isoforms produced either from alternative splicing or through different transcriptional start or stop positions
- identifying reads spanning different exons detects various isoforms present in sample
- mammalian transcripts: 1-2kb
- short read length: 150-600 bp
- 3rd gen long read sequencing better for isoforms
Oxford Nanopore Technologies
Direct RNA sequencing
library preparation process
- double-stranded Reverse Transcription Adapter (RTA) ligated to RNA through attached complementary sequences (mRNA polyA tail or target specific)
- Y-shaped adapter ligates with motor protein on RNA strand
- Motor protein directs RNA through nanopore in 3’5’ direction
- sequence >30kb read length
differential gene expression analysis
Volcano plot
- scattered points represent genes
- X axis = log 2fold change for ratio
- Y axis = -Log10 p-value (probability that gene has statistical significance in its differential expression)
- red dots = genes significantly over expressed in metastatic samples compared to primary tumors
- blue dots = genes significantly under regulated in metastatic tumors compared to primary tumors