RNA seq Flashcards
What are the objectives of RNA seq?
- Study gene regulation and expression variation; * (e.g. compare different tissues, time points, disease states)
- Understand the structure, function and organization of information
within the genome - and many more - sub-classifying cancer, spatial transcritptomics, host pathogen interaction
Describe microarrays
- quick and cost effective
- based on hybridization to complementary sequence
in affy you need to chips per experiment - for control and for the actual experiment
very noisy!
What are some limitations of micrarrays?
- The data is very “noisy”
- Expression levels are determined by a spot of light against a noisy background
- Probes are not available for all genes - Affy probes are only present for approx 75-80% of human genes
- Genes with very low expression may not be detected
- The data requires a large degree of statistical manipulation
- Result only shows a gene is expressed but gives no information about which transcript
Outline the workflow used in RNA seq
Compare RNA sequencing and Microarrays
- Method works as it can be assumed every mRNA present will be sequenced the same number of times
- If experiment shows twice as much mRNA for a particular gene as control then gene expression is 2 fold greater
- RNA-seq gives more accurate quantification and has better dynamic range (ability to quantify genes expressed at low and high levels)
- Not limited by microarray probe sequences and availability
- RNA-seq can potentially identify novel transcripts (e.g. new splice sites)
- RNA-seq can be used to study alternative splicing
Outline the RNA seq analysis procedure
Describe library preparation
- Total RNa extraction and target RNA enrichment
- Poly(A) capture
- Ribosomal RNA deplaetion
- Fragment RNA and reverse transcribe
- Ligate adapters and PCR amplify
- indexes/barcodes allow multiplexing
What should you consider in your experimental design for RNA seq?
- Single vs paired end (latter helps identify e.g. isoforms)
- Sequencing depth (deeper sequencing detects more transcripts)
- Biological replicates (important for differential expression) - you need to have many samples to identify eny errors
- Spike-in RNAs (can help with normalization and quality control) - you add a known amount of RNA and then you can normalize your data
- Multiplexing (pool barcoded samples, then split across lanes)
- Batch design (randomize samples across experimental batches, cannot correct for batch effects if technical and experimental factors are confounded)
Describe the quality control step in RNA seq
- asses the quality and trim the reads if needed
- Quality control is an essential step in the analysis as poor quality reads can significantly impact results
What are some problems you can face during QC?
- Low-quality sequences (low confidence bases)
- Sequencing artefacts (duplicate reads, sequence bias)
- Sequence contamination (reads from another organism)
How can you solve the problems of low quality in QC?
- FastQC for simple QC reads on raw reads - helps you remove the low reads or trim them down
- Discard low quality reads, and trim adapter sequences & poor quality
bases (e.g. using Trimmomatic
- Discard low quality reads, and trim adapter sequences & poor quality
Describe read alignment
After quality control, reads are aligned to a reference genome or transcriptome.
Method depends on experiment aims and availability of suitable references.
What will you have to do if you’re not confident in your mRNA reads?
if you are not confident in your mRNA reads then you will probably have to map against the genome - more difficult because then you will have to map through the exon boundaries
What methods of alignment can you have?
-alignemnt to reference genome
-alignment to reference transcriptome
alignment to de novo assembled genome
Describe alignment to reference genome
- requires splice-aware aligners (e.g. STAR, HISAT2)
- use known splice junctions, but can also discover new ones
- computational challenge is to accurately align reads that span splice junctions
- you can give it an excel sheet with the exons so it is aware of where they are
Describe alignment to reference transcriptome
- unspliced alignment (e.g. Bowtie2)
- generally faster, but requires comprehensive reference transcriptome
- main challenge is dealing with multi-mapping (reads that map to several transcripts)
Describe alignment to de novo assembled transcriptome
if no suitable reference genome, first assemble reads into contigs, and align reads to this de novo transcriptome (e.g. for novel genome, cancer samples)