8. Transcriptomics Flashcards
What is the main method of transcriptomics?
RNAseq
What is RNAseq?
- It measures gene expression via mRNA quantification
- It is a form of Next generation sequencing.
What areas of science fall into “omics”?
- transcriptomics
- Proteomics.
What are high throughput Omics methods used to do?
- They are used to study infectious diseases.
- They represent a new frontier of infectious disease research.
What are the new Omics approaches sometimes referred to as?
- Bid data biology
- Or systems biology
- Systems biology is controversial as it is vague
What is PCR?
- Polymerase chain reaction
- It has been used everywhere to identify and amplify nucleotides.
- It is used in cloning and identification.
- There is no area of biological research that has not used PCR.
What is the new PCR?
Next generation sequencing
Why has Next Generation Sequencing become more widely used?
- It has gotten a lot cheaper since the sequencing of the 1st genome.
- Technology has improved so much.
- Everything can get sequenced now.
When were microarrays used?
Before high throughput sequencing was invented.
How do Microarrays work?
- Oligos for specific genes were printed onto a ‘chip’.
- mRNA from a cell is washed over the chip.
- The more mRNA bound to the oligo means the strong signal means lots of gene expression.
- It allowed us to measure the expression of a large number of genes at once.
What were microarrays used to investigate?
- They were used to quantify changes in gene expression under different conditions.
eg infected vs uninfected. - It was the first way to look at the gene expression of lots of different genes at the same time.
What kind of system are microarrays?
Closed systems
What are closed systems?
- We can only ask questions about what we already know about.
- This is because we can only make oligos specific for the genes we know exist.
- We can never know what we don’t know.
Why weren’t microarrays very useful for virology?
- Chips were printed and mass-produced for the human transcriptome.
- This made it cheap.
- It wasn’t really useful for the gene expression of viruses.
- Not many people were using it for this so the chips were expensive to make and standardise.
- We couldn’t use microarrays for the gene expression of these.
What is an open system?
- A system where you collect all the data and interrogate it at will.
- You can go back and re-test the data as new discoveries come to light.
- You can all the info you could possibly need.
What is the main limitation of next generation sequencing methods like RNAseq?
How you collect the genetic material.
What do microarrays struggle to distinguish?
- Different isoforms of the same mRNA.
- This is because of the way oligos work through homology.
How is mature mRNA made?
- DNA gets transcribed to mRNA.
- The mRNA is then polyadenylated and spliced.
- It can be spliced differently to makes different isoforms of proteins.
How does RNAseq work?
- RNA is extracted from the cells.
- Then, enrich the RNA for polyadenylated mRNA using oligo (dT) beads.
- Then the mRNA is sheared randomly and converted to cDNA.
- The cDNA is selected by size to be about 300bp.
- The cDNA fragments are then sequenced using a PCR-based method
- This produces paired-end reads of the cDNA fragment.
Why is polyadenylated mRNA enriched for RNAseq?
- This is done using Oligo (dT) beads.
- It is impossible to purify for mRNA.
- This is due to the amount of other RNAs in the cell.
- Also as mRNAs are often bound to other RNAs
How is cDNA made from mRNA?
Using reverse transcriptase
Why is fragmentation of mRNA essential?
Fragmenting is essential due to the limitation of the Illumina sequencing technology. It cannot read longer then about 300bp.
What sequencing technique is used in RNAseq?
- Illumina
- This is a PCR-based amplification method.
- It gives you a 150bp read from each end of the fragment. These are the paired-end reads.
How many fragments can be sequenced in 1 run of RNAseq?
Around 30 million fragments.
What is a paired end read?
- Each fragment is sequenced 150bp in from the left and right hand side.
- When you get the sequence back you know which sequences are from the same piece of cDNA and therefore mRNA.
Why do PCR and PCR-based sequencing introduce bias into a sample?
- This is because some fragments amplify better than others.
- This is also because it is an enzyme-based method so it has problems as some things are easier/more likely to be sequenced.
How do you know which paired end reads belong together?
- They will have the same identifier number.
- The left-hand read has /1
- The right-hand read has /2.
- You often don’t know the sequence between the 2 fragments.
What is the quality score given with an RNAseq file?
- It is a symbol that indicates the probability that the nucleotide about it is correct.
- There is a reference table to look these up.
What is a FASTQ file?
- The file you receive the RNAseq data in.
- The Q indicates it has a quality score.
How often is deep sequencing like RNAseq correct?
- 99% of the time
- But you need to bear in mind that it is a biological system, and things can sometimes go wrong.
How are computational models used to analyse RNAseq files?
- To work out where on the genome the sequence reads came from.
- If the sequence has been broken across a splice site.
- How much from each region of the human genome was in the mRNA sample.
Why can the mRNA sequence be broken?
- If the mRNA sequences cross an exon splice site, the cDNA won’t match the genomic DNA.
- You need to work out where/if it happens.
How do we find out where in the genome the mRNA comes from?
- You use computer programs to map backwards onto the genome.
- This tells us which region or gene on the genome the mRNA comes from.
- This allows us to work out which genes were on and which were off.
- It can also inform us about different isoforms.
How does RNAseq distinguish between different isoforms?
- It shows the different poly A sites.
- It shows which exons are included in the different isoforms.
- The dashed lines show where the introns are.
What do the graphs on the RNAseq read show?
- The amount of sequence reads that are from the same place on the genome.
- It shows how the sequence maps to the genome.
- It does this by places sequences where they match not by knowing the genes.
- It works well and is accurate.
Why can there be different numbers of reads for exons that only appear together?
- This shows that some pieces of mRNA are harder to reverse transcribe, amplify and sequence than others.
- Theoretically the exon abundance should be the same.
- The differences are due to the nature of the biological system.
- The different charts also can have different scales that you need to be aware of especially if you are measuring over different time points.
what must the sample be before you can sequence it with Illumina?
DNA
What other methods can be used to obtain DNA samples?
- CHiP-seq.
- Whole exome sequencing
- Extracting polysomes
What is CHiP-seq?
- This shears DNA-protein complexes and then extracts them with antibodies.
- You then remove the proteins and get the bits of DNA that were bound to them.
- This is then used to interrogate different proteins that bind to the genome.
- Things like transcription factors or histones.
- Find out where/how they bind and in what conditions they bind.
What is whole exome sequencing?
- Oligos are used to extract the exons.
- This was used more when sequencing was more expensive.
- Now you would just sequence the whole genome.
What are polysomes?
Actively translating ribosomes
Why are polysome extracted to sequence the mRNA?
- This shows which bits of mRNA are being actively translated into proteins.
- This is important as mRNA can be made and not translated.
What has adenovirus been used to discover?
- Splicing.
- Cell cycle control
What is TopHat?
- It is a short read mapper.
- It maps the sequence reads to the adenovirus and human genome.
- It takes introns into account.
- It shows what happens to human and virus genes at the same time.
- New programs are faster.
What do adenoviruses do to cells?
- They force the cells to enter the cell cycle.
- This is because they like dividing cells as they are more metabolically active and better for viral replication.
What is the life cycle of adenoviruses?
- Early genes are transcribed, the cell cycle is disrupted and viral DNA replication proteins are made.
- DNA replication triggers transcription of major late mRNA.
- Host cell splicing subverted, host mRNA transport is shut off and host rRNA processing and export is stopped.
- Assembly of the viral capsid and the DNA is packages.
The cell dies.
What are HeLa cells?
Human epithelial cells transformed by HPV into cancer cells.
Do you get a exact number of reads for every sample?
- No, you only get an approximate number of reads.
- You don’t get an absolute say in how many reads you get.
- This can influence the results.
What does the splicing map on the RNAseq read show?
- It maps the splicing events between different exons.
- You can infer changes in splicing events.
- Shows how often different splicing events happen
What is cufflinks?
- A gene expression analysis software.
- It maps the number of reads to each gene.
What are FPKMs?
- fragments per kilobase per million bases.
- This tells you how many sequence reads per 1,000 bases of mRNA per million bases you can map onto the human genome.
- This accounts for difference in read number and mRNA length
What type of mRNA gives more sequence reads that map to the genome?
Long mRNA
Why do long pieces of mRNA gives you more sequence reads then short pieces?
This is because long mRNA will produce more 300bp fragments than short mRNA.
What do FPKMs allow you to do?
- As they account for the difference in read number and mRNA length, they can be used to compare different samples and different genes.
- It is an absolute number so you can compare.
- an FPKM of 8 has 2x as many reads as an FPKM of 4.
What can cause control sequences reads to have low levels of reads?
Cross contamination during handling steps
How does the expression of the adenovirus genome change from T8 to T24?
- At T8, the E1-4 region is highly expressed and the rest of the genome is transcriptionally quiet.
- At T24 the rest of the genome is highly expressed
- The early proteins are still expressed, just at much lower levels compared to the late proteins.
What can deep sequencing of viral genomes inside cells tell us?
- You can examine the whole viral population.
- You can identify the dominant virus variant and the minor variants.
- You used to inform drug selection if you find a resistant variant.
- You can also see where different variants emerge from before they become big.
- This has been done in HIV and SARS-CoV-2.
What has deep sequencing of clinical samples shown?
- Many viruses have more complex genomes then we thought.
- Many novel genes have been identified this way.
What did deep sequencing of human cytomegalovirus show?
It is much more complex and contained many new novel transcripts.
What is the future for deep sequencing?
- increasing throughput
- Reducing cost.
- This is a tough computational challenge as 1GB of raw data generates 10GB of analysis.
- Another option is de novo assembly.
What is de novo assembly?
- You take the short sequence reads and find overlapping regions.
- You use this to attempt to rebuild the mRNA fragments without knowing anything about the target genome.
- This requires supercomputers and takes days.
What is 3rd generation sequencing?
Nanopore sequencing.
What is nanopore sequencing?
- A new sequencing system that does not fragment the genetic material before sequencing.
- It allows rapid sequencing of genetic material.
- Can be powered with a laptop or phone to use in the field.
- Has been used to identify bacteria and AMR in about 10 hours.
How does nanopore sequencing work?
- It uses an array of pores from bacteria.
- A Dock protein attaches the nucleic acid to the pore.
- A single strand of nucleic acid goes through the pore.
- The pore is embedded in a membrane and as the nucleic acid goes through the membrane.
- As it crosses the membrane you measure the electrical resistance produced and create a trace.
- It used a protein to slow down the feed to give a more accurate sequence.
- Computers then interpret the electrical output and convert it into sequence information.
What are the key advantages of nanopore sequencing?
- It can sequence RNA directly and see where the exon junctions are.
- It can be used to sequence very long pieces of nucleotides. Up to about 1 million bases.
What are the drawbacks of nanopore sequencing?
- Between 1 in 10 and 1 in 20 nucleotides are wrong.
- This usually causes indels. (insertion/deletions)
How was nanopore sequencing to examine the adenovirus genome?
- It was used to study the splicing events as this is hard o do with RNAseq.
- It gave an accurate image of the adenovirus genome at different points in infections and rebuild the adenovirus transcriptome.
- It showed many different transcripts that are randomly made and previously unknown.
What is the purpose of the randomly generated adenovirus transcripts?
- They are outside the dominant genes with the essential functions.
- They randomly make other proteins to see if they give a selection advantage or make a useful protein.
How was nanopore sequencing used to examine the SARS-CoV-2 genome?
- It was used to look at the SARS-CoV-2 transcriptome in lots of detail.
- All the SNPs were identified as well as deletions that lead to later variants.