Module 8.2 RNA Sequencing Flashcards

1
Q

RNA-Seq

A
  • presence and quantity of RNA
    molecules in a biological sample
  • gene expression in sample
  • Alternative gene spliced transcripts, exon-intron boundaries
  • Post-transcriptional modifications
  • Genetic variants such as fusions, indels, SNVs
  • Changes or differences in gene expression levels or patterns
  • Different populations of RNA subtypes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

RNA sequencing methods

types (2)

A
  1. direct sequencing
  2. indirect sequencing (cDNA)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Indirect RNA sequencing

process overview (6)

A
  1. Isolate RNA from samples
  2. Fragment DNA into short segments
  3. convert RNA fragments into cDNA
  4. Ligate sequencing adapters and amplify
  5. Perform NGS sequencing
  6. Map sequencing reads to transcriptome/genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

indirect RNA sequencing

benefits and drawbacks

A

Benefits
- can be used for measuring RNA abundance or assembly
- widely used and enables functional dissection of complex transcripts

Drawbacks
- can’t tell which strand RNA was transcribed from
- when fragmented, lose poly a tail information at 3’ end in most mRNA fragments
- requires relatively large quantity of input RNA
- often result in under representation of 5’ end sequence of RNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Indirect sequencing

Isolate RNA from samples

targeted sequencing methods

A
  • affinity chromatography
  • gel electrophoresis
  • enzyme depletion
  • targeting enrichment (eg. polyA library formation, size selection for miRNA, rRNA removal)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Indirect RNA sequencing

Convert RNA fragments into cDNA

reverse transcription

A
  • aka first strand DNA synthesis
  • PolyT primer to bind poly A tail
  • gene specific primers
  • random hexamer primers for total RNA
  • 5’ end of mRNA may be underrepresented in cDNA if transcript is long and fragmented
  • oligo dT primers + random hexamer = more full length cDNA and less 5’ end bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

retrovirus

A
  • virus that uses RNA as genomic material
  • use reverse transcriptase to convert viral RNA genome into complementary DNA molecule which integrates into whole cell’s genome.
  • cell can produce more retrovirus that infects other cells
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

reverse transcriptase

features

A
  • synthesize DNA from RNA template
  • polymerase and nuclease active sites
  • retroviruses, prokaryotes, and eukaryotes
  • used to extend telomeres in eukaryotes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

second strand synthesis

methods (3)

A
  1. hairpin primed synthesis
    - cDNA 3’ hairpin to prime second strand
    - NaOH removes RNA, S1 nuclease removes loop
    - priming is random and hairpin hydrolysis step may lead to loss of information
  2. RNase H activity
    - Nick RNA with RNase H. Nicks act as primers for 2nd DNA strand
    - RNA degraded and nicks of synthesized DNA products ligated via DNA ligase
  3. Random hexamers
    - NaOH removes RNA
    - prime and synthesize 2nd strand of DNA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

strand specific seq

UDG

A
  • Uracil DNA glycosylase
  • recognizes and removes dUTP from DNA molecule
  • USER- uracil-specific excision reagent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

dUTP

features

A
  • deoxyuridine phosphate.
  • very similar structure to dTTP
  • With certain DNA polymerases, can be easily incorporated into genome by pairing with dATP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Strand-specific RNA-seq

dUTP method

process

A
  1. mRNA nicked with RNase H, dUTP incorporated into 2nd strand instead of dTTP with DNA polymerase
  2. double-stranded cDNA molecules ligated with double stranded Y sequencing adapters
  3. 2nd cDNA strand removed when UDG removes dUTPs, so only first strand of cDNA with two adapters are PCR amplified
  4. Read 1= transcription direction, Read 2 = opposite direction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

strand-specific seq

Directly Ligate Sequencing Adapters to the First-Strand cDNA
(DLAF)

process (5)

A
  1. double-stranded adapters ligate to single-stranded cDNA via random nucleotide overhangs
  2. left adapter: blocked 3’ end, 5’ phosphate -> ligate to 3’ end of cDNA
  3. right adapter: 3’ OH group, no 5’ phosphate -> ligate to 5’ end of cDNA
  4. top strand of adapter contains dUTP bases removed by UDG enzyme
  5. only strand with left and right adapters (reads 1 and 2) amplified and sequenced
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Template Switching Technology

A
  • terminal transferase activity of MMLV reverse transcriptase adds additional nucleotides (mostly dC) to 3’ end of newly synthesized cDNA strand
  • template switching oligo (TS Oligo) has 3 riboguanosines at its 3’ end and universal adapter sequence at 5’ end
  • Upon base pairing between TS Oligo and 3’ dC stretch, reverse transcriptase “switches” template strands and continues replication to 5’ end of TS Oligo
  • resulting cDNA contains complete transcript and universal adapters
  • reduces number of steps -> less sample loss during library preparation
  • shown promise in generating full length cDNA libraries even for single cell derived RNA samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Factors that influence read count

3

A
  1. Sequencing depth
  2. Transcript length
  3. RNA composition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Read count infuences

Sequencing depth

A
  • total number of reads for each sample
  • deeper sequencing = higher read count
  • need to remove effect of different read numbers between samples
17
Q

Read count influences

Gene/Transcript length

A
  • a long transcript will have more reads mapping to it compared to a short gene of similar expression
  • cannot directly compare the read count for two different transcripts of different size
18
Q

Read count influences

RNA composition

A
  • Read counts measures relative abundance of transcripts within sample
  • Highly abundant transcript may take over significant portion of reads
  • few highly expressed genes may contribute to very large part of sequenced reads in experiment, leaving only a few reads to be distributed among remaining genes
  • Removal of overexpressed transcription (eg. subtraction hybridization for rRNA) enables detection of transcripts at low level
19
Q

RNA-Seq

Other sources of technical variation

types (4)

A
  • GC bias
  • Library preparation methods
  • Sequencing batches
  • Experimental designs
20
Q

Differential gene expression analysis

A
  • analysis and interpretation of differences in abundance of gene transcripts within a transcriptome according to phenotypes or experimental conditions
  • RNA-seq measures gene expression level by read count distribution
  • gene is declared differentially expressed if an observed difference or change in read counts between two experimental conditions is statistically significant
  • Volcano plots and heat maps
  • may guide discovery of biomarkers, therapy targets, and treatment decisions
21
Q

RNA seq applications

fusion and translocation variant detection

features

A
  • Gene fusion often result in chimeric transcripts where exons from two different genes are joined together
  • fusion events usually in intron regions
  • identifies reads spanning through two exons joining together
  • more efficient and information is valuable for understanding functional consequences of gene fusion and potential role in disease
22
Q

RNA isoform detection

A
  • isoforms produced either from alternative splicing or through different transcriptional start or stop positions
  • identifying reads spanning different exons detects various isoforms present in sample
  • mammalian transcripts: 1-2kb
  • short read length: 150-600 bp
  • 3rd gen long read sequencing better for isoforms
23
Q

Oxford Nanopore Technologies

Direct RNA sequencing

library preparation process

A
  1. double-stranded Reverse Transcription Adapter (RTA) ligated to RNA through attached complementary sequences (mRNA polyA tail or target specific)
  2. Y-shaped adapter ligates with motor protein on RNA strand
  3. Motor protein directs RNA through nanopore in 3’5’ direction
  4. sequence >30kb read length
24
Q

differential gene expression analysis

Volcano plot

A
  • scattered points represent genes
  • X axis = log 2fold change for ratio
  • Y axis = -Log10 p-value (probability that gene has statistical significance in its differential expression)
  • red dots = genes significantly over expressed in metastatic samples compared to primary tumors
  • blue dots = genes significantly under regulated in metastatic tumors compared to primary tumors
25
Q

differential gene expression analysis

Heat map

A
  • row = gene
  • column = patient
  • Patients grouped together by histologic types of cancer
  • red dot = upregulated gene in patient
  • green dot = downregulated gene in patient
  • patients and genes grouped into clusters = identify differential gene expression patterns associated with different subtypes of cancer
26
Q

Single-cell RNA sequencing

features

A
  • provides expression profile of individual cells
  • can identify and characterize transcriptionally distinct subpopulations and states within cell population
  • enables detailed unbiased characterization of transcriptional features underlying important phenotypes
  • enable accurate characterization of heterogeneity
  • understand mechanisms of cancer pathogenesis,
  • develop effective treatment strategies
  • identify novel targets for immunotherapy and drug development
  • current gold standard for defining cell states and phenotypes
27
Q

Single-cell RNA sequencing

process

A
  1. isolate single cells from tissue
  2. reverse transcription: individual cells separated into microwells or via emulsion PCR
  3. Amplification and sequencing like bulk RNA
28
Q

Single cell RNA sequencing

Reverse Transcription Droplet Method

features

A
  • 10X Genomics
  • data can be used for cellular phenotype classification or new subpopulation identification
  • allows detection of rare cell types
  • high throughput: 500 to 10,000 cells can be captured per sample from single cell suspension
  • need fresh samples
  • need to preserve initial relative abundance of MRNA in cell to identify rare transcripts
  • requires tissue dissociation and cell isolation
29
Q

Single cell sequencing

Reverse Transcription Droplet Method

process

A
  1. Cells encapsulated into droplets in automated machine via microfluidic chip that combines all components with oil
  2. Each droplet contains one cell, one gel bead, and reverse transcription reagents
  3. 4-part oligos attached to gel bead.
    - sequencing primer
    - 10X barcode unique to gel bead
    - UMI sequence unique to cDNA molecule
    - poly DT sequence to capture cDNA (mRNA polyA tail or target RNA) and functions as primer for reverse transcription.
  4. cell is lysed in droplet and undergoes reverse transcription
  5. oil is removed and cDNA pooled together for PCR amplification and sequencing.
  6. cDNA from the same cell are identified through 10X barcode.
  7. number of UMI’s can be used for digital count of cDNA copy to analyze gene expression level
30
Q

RPKM

A
  • RPKM: Reads Per Kilobase of transcript per Million mapped reads (single-end RNA-seq)
  • FPKM: Fragments Per Kilobase of transcript per Million mapped reads (paired-end RNA-seq)
  • RPKM = (109 * Reads mapped to transcript) / (Total reads∗Transcript length)
  • PMSF: total reads / 1 million
  • number of transcript reads / PMSF = RPM
  • RPM / gene length in kilobases = RPKM
31
Q

TPM

A
  • Transcripts Per Kilobase Million mapped reads
  • TPM = 106∗ (RPKM / Sum(RPKM))
  • number of transcript reads / gene size in kb = RPK
  • total RPK values / 1,000,000 = PMSF
  • RPK / PMSF = TPM
32
Q

Spatial Transcriptomics

process (8)

A
  1. Tissue samples cut into very thin slices, fixed, stained and put on spatial transcriptomic slide
  2. undergoes enzymatic permeabilization process so molecules in cell diffuse down to slide
  3. mRNA released and binds to barcoded probes
  4. Reverse transcription carried out in situ
  5. synthesized cDNA contains spatial barcode and UMI provide information about gene expression and location
  6. libraries prepared and analyzed by sequencing
  7. spatial barcode present within each generated sequence allows data for each mRNA transcript to be mapped back to position on slide.
  8. by overlaying picture of tissue with slide, can analyze point of origin for mRNA transcripts within tissue section
33
Q

Spatial transcriptomics

features

A
  • method for assigning cell types identified by mRNA readouts to their locations in histological section
  • used to determine subcellular localization of messenger RNA molecules
  • able to capture positional context of transcriptional activity within intact tissues (regions or single cells)
  • use intact tissue sections and spatial transcriptome slide (glass slide containing arrays of spatially barcoded oligo probes)
34
Q

spatial transcriptomics

slide oligo probe components (5)

A

(Top to bottom)
1. oligo dT tails to capture mRNA
2. UMI
3. spatial barcode/ID to indicate oligo position on slide
4. Reverse Transcription primer
5. Cleavage Site

35
Q

Stereo-Seq

features

A
  • combines in situ capture with DNB-Seq for in situ sequencing
  • DNB-seq = 400 million spots per 1 square centimeter. = much higher resolution and wider field of view
  • Stereo-Seq chip (top to bottom): Poly T, UMI, coordinate ID (CID)
  • By processing multiple different slices of sample, can construct 3D picture of gene expression