RNA-sequencing: Global Gene Expression Analysis (4) Flashcards
What is the difference between mRNA and miRNA?
mRNA:
- Messenger RNA
- Is transcribed as a template for protein synthesis
miRNA:
- Micro RNA
- Regulate gene expression by preventing translation of mRNAs
How is the transcriptome analysed?
Low-to-mid-plex techniques:
- Northern blot
- Fluorescent in situ hybridization
- Reverse transcription PCR (RT-PCR)
Higher-plex techniques: Unbiased
- DNA microarray
- Tiling array (subtype of microarray chips)
- RNA-Seq
What are some applications of RNA sequencing?
- Abundance estimation
- Alternative splicing
- RNA editing
- Finding novel transcripts
- Ribo-seq (analysis of those RNAs in active translation)
- Single cell sequencing
What does RNA sequencing show?
Provides quantitive data for all the transcripts expressed in a particular sample (tissues, conditions, stages, etc.)
What are some advantages and disadvantages of RNA sequencing?
Coupled with high throughput sequencing
Quantitative
Highly technical and relatively expensive (becoming cheaper)
Relies heavily on computational statistical analysis
How is RNA isolated for RNA sequencing?
Under denaturing conditions (remains integrity)
Phenol etc.
How is RNA converted to cDNA?
RNA fragments are reverse transcribed using hexamer primers and reverse transcriptase
How and why is the chosen RNA fragmented further?
Size limitation of sequencers
Chemical cleavage- divalent cations, high pH + temperature
Enzyme digestion- RNase III digestion
How are cDNA fragments prepared for amplification and sequencing?
Ligated to DNA adapters
What are the potential issues of cDNA ligation to DNA adapters?
- Loses the information about which DNA strand corresponds to the sense strand of RNA
- Lack of strand specificity would make it difficult to identify antisense and novel RNA species and cause inaccurate measurement of sense RNA expression
What is the simple approach of attaching adapters before amplification?
Attaching different adapters directly to the 5’ and 3’ ends of the RNA molecule
- Removal of 3’ phosphate group from fragmented RNA and addition of a 5’ phosphate group
- Sequential ligations of a 5’ adenylated 3’ adapter using a truncated RNA ligase II and a 50 adapter ligation using RNA ligase I
- The sequence difference between 5’ and 3’ adapters preserves RNA strandedness
What is the dUTP method of attaching adapters before amplification?
Incorporates dUTP into the second strand of cDNA
- The labelled strand can be degraded before PCR amplification with uracil DNA glycosylase (UDG) - an enzyme that cleaves the uracil base in dUTP-containing DNA
- The U-containing strand is a very poor template for thermostable polymerases (essentially not amplifiable)
- Only the first strand cDNA with defined adapter sequences is amplified
- Conferring directional information to the sequencing reads
How and why is cDNA amplified before sequencing?
Most sequencers have a detection limit so cDNA libraries are amplified by PCR
- Only a small number of amplification cycles (8–12) are used during PCR
- Variations in cDNA size and composition can result in uneven amplification
- Amplification of some cDNAs plateau while others continue to amplify exponentially
How are cDNA libraries sequenced?
On a high throughput platform to generate tens of millions of short reads
- In a flow cell
- Bridge amplification → cluster (polony) formation → forward read → reverse read
How are the reads from RNA/cDNA sequencing initially processed?
They are aligned to existing gene framework (annotated genome/transcriptome)
This can occur through Hash tables or Burrows-Wheeler Transformations
How is RNA/cDNA data mapped?
Summation to determine counts per gene
- Reads from cDNA may map to coding sequences, to 5’ and 3’ UTRs to introns and to exon junctions
- Complicated due to alternative splicing
- The accuracy depends on the quality (annotation) of the reference sequence
- Depending on which reads are included, estimates of mRNA expression will differ
- A common procedure is to include all counts mapped to exons of a gene
How and why is RNA sequencing data normalised to accommodate for varying gene length?
Allows accurate comparison of expression levels within and between samples
Divide read count by gene length = RPK (reads per kilobase of exon sequence) - within samples
Adjust for total number of reads in each sample = RPKM (RPK per million reads)
How and why is RNA sequencing data normalised to accommodate for varying library size?
To make the library sizes comparable by scaling raw read counts in each sample by a single sample-specific factor reflecting its library size
Trimmed mean of M values (TMM): based on the hypothesis that most genes are not differentially expressed
How and why is RNA sequencing data probability-base normalised?
- Genes transcribed to yield mRNA amounts that vary under two different conditions
- Unregulated genes yield same/similar mRNA amounts under different conditions
To minimise false discovery rate use probability-based normalisation
Poisson distribution, where mean = variance, is prone to high false positives
Negative binomial distribution, where mean < variance, reflects over-dispersion in RNA-seq read counts.
What are examples of Next Generation RNA sequencing?
ChiP-seq
Ribo-seq
sncRNA-seq
What is ChiP-seq?
Used to analyse protein interactions with DNA
- Combines chromatin immunoprecipitation (ChIP) with DNA sequencing to identify the binding sites of DNA-associated proteins
- Identifies genomic location of transcriptional regulators and epigenetic markers
- Key reagent: an antibody with high specificity for factor/epitope of interest
What is Ribo-seq?
Correlation between mRNA and protein levels is frequently poor
Ribo-seq analysed the mRNA fragments that are being translated
Compares ribosome-bound mRNAs, which are undergoing active translation, with the total mRNA pool in the cell
What is sncRNA-seq?
Small non-coding RNA-seq
Very small percentage of total RNA is messenger RNA and by focusing only on mRNA, we ignore ~95 % of RNA transcripts
snc-RNA seq attempts to characterise the landscape of non-coding RNAs in cells