RNA Sequencing Flashcards
What is RNA Sequencing (RNA-Seq)?
A genomic technique that measures the quantity and presence of RNA molecules in a biological sample.
What is RNA sequencing commonly used for?
Analysing gene expression and transcription at the genome level.
What are the three steps for RNA-Seq?
- Prepare a sequencing library
- Sequence
- Data analysis
What are the 6 steps of preparing an RNA-Seq library?
- Isolate RNA
- Break RNA into small fragments (200-300bp)
- Convert the RNA fragments into dsDNA
- Add sequencing adaptors
- PCR amplify the library
- Quality Control
Why do we convert the RNA fragments into double stranded DNA (during RNA-seq library preparation)?
Because dsDNA is more stable than RNA.
Therefore can be easily amplified and modified.
What is the purpose of sequencing adaptors (in RNA-seq library preparation)?
- Allows attachement to flow cell
- Identification of fragments (can sequence multiple at once).
- Allows the sequencing machine to recognise fragments
What is checked in Quality Control (library preparation)?
- Verification of library concentration
- Verification of library fragment lengths (not too long or short)
When sequencing, how many fragments are laid out in a grid?
400,000,000
What is the name of the grid with fragments laid out during sequencing?
Flow cell
How does sequencing work in RNA Seq?
- Inside flow cell, fluorescent probes are colour coded according to the type of nucleotide they bind.
- After each nucleotide in fragment is tagged, machine takes picture of flow cell from above.
- Probes are washed away.
- Process repeats until machine has detected each sequence of nucleotides.
What are quality scores?
Quality scores reflect how confident the machine is in the base it has called.
What causes low quality scores?
- Low diversity (over abundance of single colour) so hard to identify individual sequences.
- Probe not shining as bright as it should.
What are the four lines of data in a sequencing read?
- 1st line: Always starts with ‘@’, followed by a unique ID for the sequences.
- 2nd line: Contains the bases called for sequenced fragment.
- 3rd line: ‘+’ character.
- 4th line: Contains quality scores for each base in the sequenced fragment.
What are garbage reads?
- Reads with low quality base calls
- Reads that are clearly artifacts of the chemistry
What is an artefact of the chemistry?
When adaptors bind to each other instead of DNA fragments and create a false ‘read’.
What is the first step involved in aligning the reads to a genome (after sequencing the samples)?
Split genome sequence into small fragments.
Creates an index of all fragments and locations within the genome.
Why are we breaking the sequences up into small fragments?
Allows us to align reads even if they are not exactly matched to the reference genome.
Why do we want to know the chromosome and position for a read?
To see if it falls within the coordinates of a gene.
What is Bulk RNA-seq?
- Method that measures the average gene expression levels in a sample that contains a mixture of cells.
- Commonly used
- Original method
What is single cell RNA-seq?
Technique that analyses the RNA of individual cells
What is normalisation of RNA-seq data?
Adjustment of raw data to account for factors that prevent direct comparison of expression measures.
What is Principal Component Analysis (PCA)?
Statistical method that reduces the dimensionality of a dataset by transforming the data into a new coordinate system.
(reduces number of axes needed)
What is the function of a Mean Analysis (MA) graph and what are the two axis?
Used to display differentially expressed genes in RNA-seq analysis.
X axis: Mean expression
Y axis: Log Fold Change between the two conditions (e.g., “normal” and “mutant”).
Eat dot is a gene.