Transcriptomics Flashcards
What is RNAseq
This looks at he expression of genes by quantifying message RNA. This is known as quantitative transcriptomics (NGS)
What is high throughput quantitative proteomics
This is quantifying and identifying the proteins that are in a cell
Sequencing the nucleic acids and then sequencing the proteins
What are microarrays
These are oligos that are made and then fitted onto a glass chip and washed with mRNA from cell extracts onto the chip and the more RNA that bound to the chip, the better a signal would reflect from the chip.
What are the benefits of using microarrays
It is a good way to check the levels of expression from a very large number of genes at the same time
Its possible to quantify changes in levels of gene expression between an infected or uninfected cell
You could interrogate the human genome simultaneously and ask questions about which genes are expressed in an unbiased manner.
What are the disadvantages of using microarrays
Its a closed system from the pov of data - you an only ask questions about things we know about. If there were more genes that were unknown - there would never be any answers concerning the new genes.
This is not helpful for studying viral genes - creating these chips and standardising them and making them reliable was a hard process - chips were only made for genes that had a lot of demand in usage such as studying the human genome.
The dynamic range was limited - 100-fold change was not easily detected: measuring big changes was limited.
What is Illumina NGS
In this technique, there is information on RNA levels changing and changes in the virus gene expression capture at the same time.
This is an open system - you could collect the sequence data and then interrogate it afterwards at will.
How is RNA made
DNA is transcribed into RNA which is then polyadenylated and spliced which brings different exon combinations together.
This might lead to mRNA variant 1 or mRNA variant 2: leads to the translation of two different proteins
Why was it hard for microarrays to detect these two RNA variants
This is because of the way they were made: Oligos could not have any overlapping sequence or any other genes and trying to capture this isoform variation was extremely hard.
How can RNA be extracted and enriched for mRNA
- RNA is extracted from cells and enriched for polyadenylated RNA
- Oligo (dT) beads (magnetic beads that bind poly A tails) are used to selectively capture mRNA
- Other RNAs may still be present due to non-specific interactions
- Some mRNAs can anneal to other RNA molecules via homologous sequences, leading to unintended co-purification
How is RNA converted to cDNA and sequenced
- RNA is randomly sheared into smaller fragments
- Reverse transcriptase converts RNA fragments into cDNA
- Filtered cDNA fragments of similar size are selected
- 30 million cDNA fragments are sequenced using PCR-based amplification
- Sequencing reads the ends of each fragment (paired end read)
- PCR amplification introduces biases (some fragments amplify better than others)
- Enzymatic sequencing methods can favour certain fragments, leading to uneven data representation.
What is a FASTQ file, and why is sequencing accuracy important
FASTQ file includes the quality scores for each base in the sequence
Deep sequencing i powerful and generally accurate
However, since it relies on biological processes (enzymes etc.), errors can occur.
It is important to consider potential sequencing errors when analyzing data.
What is the first challenge after sequencing
- Sequencing generates huge files with millions of genes sequences
- The goal is to determine where each sequence originated in the sequence - is the sequence from a broken splice site, indicating a potential genome issue.
How do researches determine how much of the genome was in the mRNA sample
The next step is to quantify how much of each genome region appears in the mRNA
Software is used to map the sequences back to the human genome
How does mapping sequences to the genome help
Since gene locations are known, sequences can be matched to specific genes.
This helps determine whether a gene was active and how much it was expressed.
How can sequencing reveal different mRNA isoforms
- Mapping also identifies which mRNA isoform was most frequently produced
- This tells researchers whether one version of the mRNA is dominant over others
Why do some exons appear more abundant in mRNA sequencing
Some sequences are easier for polymerases to amplify and sequence
This can make certain exons appear more abundant than others
Why do RNA sequencing data show varying abundance across a gene
RNA sequencing is a biological process, and errors can occur
Different enzymes involved may introduce biases in sequencing
Some regions are easier to amplify and sequence, affecting reported abundance
What is CHiP-seq used for
- It identifies where transcription factors bind on the genome
- It helps compare binding in normal vs infected cells
How does CHiP-seq work
1) it covalently attaches proteins to DNA in cells
2) Shear DNA-protein complexes into small fragments
3) Use an antibody to bind the transcription factor of interest
4) Immunoprecipitate the transcription factor along with the attached DNA
5) Remove proteins and sequence the extracted DNA
What information does CHiP-seq provide
- It reveals which DNA sequences a transcription factor binds to
- Helps understand gene regulation in normal vs. disease conditions.
What is whole genome sequencing
WES targets the exons - the coding region of the genome
Uses oligos (short DNA probes) to capture exon regions
DNA is fragmented, exons are extracted, and then sequenced
What is polysome sequencing and how does it work?
Polysomes are actively translating ribosomes
RNA associated with polysomes is fragmented and sequenced
Ribosomes are removed before sequencing
What does Polysome Sequencing reveal
Provides a snapshot of which mRNAs are actively being translated into proteins
Helps determine which genes are being expressed at the protein level.
Why is Adenovirus a good model for research
Causes mild symptoms, making it safe to work with
Delivers DNA to the human genome, useful for studying gene regulation
Led to the discovery of RNA splicing
How was Adenovirus used to study gene expression
- Extracted mRNA from uninfected and infected cells
- Compared mRNA at two different time points post-infection
What is TopHat and how was it used
TopHat is a short-read mapping software
It maps sequence reads to both the human adenovirus genomes
Helps analyse how infection affects gene expression
What is the structure of adenoviruses
- non-enveloped, linear double stranded DNA genome
- virus replication and assembly of particles occurs in the nucleus of an infected cell
- Most famous for its potential as a gene therapy vehicle, an anticancer agent or as a vaccine delivery system.
What is the lifecycle of the adenovirus
1) Early genes are transcribed and this causes cell cycle disruption and forces the cell into the cell cycle. Viral DNA replication protein synthesised
2) DNA replication triggers transcription of major late mRNA
3) Host cell splcing subverted. Host mRNA transport shuts off. Host rRNA processing and export stopped
4) Assembly of viral capsids. Packaging of viral DNA. Cell death.
How were HeLa cells used in the experiment
HeLa cells were infected with adenovirus
Cytoplasmic RNA was harvested 9-24 hours post-infection alongside an uninfected control
Poly A+ selection was used to extract mRNA
mRNA was chemically sheared and sequenced using illumina
How many sequencing reads were generated
Each sample was sequenced in a single illumina lane
Generated 30 millions 50bp paired end reads
The exact number of reads varies, affecting downstream analysis
How were reads mapped and analysed
Reads were mapped to human, adenovirus, and HPV genomes using TopHat.
Gene expression levels were analysed using Cufflinks software
Expression was measured in FPKMs (Fragments Per Kilobase per Million bases mapped
Why do longer mRNAs generate more fragments
A 1000-nucleotide mRNA generates twice as many fragments as a 500-nucleotide mRNA
Longer mRNAs naturally yield more mapped reads
How is gene expression compared across samples
- FPKM values correct for read count and RNA length
- Produces a digital number that allows a comparison samples
- Deep sequencing provides a relative measure of gene expression between genes
Why is PCR amplification used in virus genome analysis
PCR amplification used to amplify regions of a virus genome and then deep sequence the virus genome
Why is deep sequencing of the virus genome useful
It helps to see the total population of the virus genome present in a sample
How does deep sequencing help in detecting drug-resistant variants of a virus
It can reveal the presence of drug-resistant variants of a virus that may be lurking in the sample, which is particularly useful for studying the emergence of HIV variants
What can deep sequencing of viruses reveal over time
Over time, it can show the emergence of the dominant variants, especially as drug selective pressure is applied
What is third generation sequencing
Nanopore sequencing is the newest and latest sequencing system
This system does not fragment genetic material prior to reading the sequence - allows rapid sequencing of genetic material on the ground.
Dominant technologies used to sequence SARS-CoV-2 genomes from clinical samples.
How does nanopore sequencing work
1) As the molecule passes through the pore the electrical resistance across the pore changes and these changes can be used to determine the sequence of the nucleic acids
2) On the chip there is an array of pores, the nucleic acid that is covalently attached to a protein docking primer.
3) The dock then attaches the nucleic acid to the pore and the nucleic acid then goes through the pore.
4) The pore is embedded into a membrane
What are the advantages to third generation sequencing
- long strands of nucleic acid go through as a single read.
- Each pore can sequence many individual molecules one after the other
- It is possible to sequence the whole length of the RNA - you can see exactly where all the exon junction combinations are.
- It sequences the RNA directly and in some circumstances can detect post transcriptional modification
- It has the power to resolve isoforms, the very long reads and the plummeting costs means that this technology is going to feature strongly in the future.
What are the disadvantages to third generation sequencing
- Approximately 1 in 10-20 nucleotides are wrong and mostly this means indels (insertions and deletions) in the sequence.