Transcriptomics Flashcards
What is transcriptomics?
- systematic analysis of transcripts
- length of RNA or DNA that has been transcribed respectively from a DNA or RNA template
- identify the genes that are active in a particular condition
- measure the gene level of expression
- Two main approaches:
- EST sequencing (Expressed Sequence Tags)
- Full length sequencing
EST sequencing
- also called single pass sequencing
- resulting sequences obtained by a single read
- no confirmation of correctness
- random sequencing of cDNA libraries
- allowed to discover many transcripts or fragment of transcripts
- often poor quality of reads
- from the number of reads aligned it was possible to estimate the expression level of all the genes
Full length sequencing
- more difficult than EST sequencing
- full length sequence of that transcript
- able to infer the full length of the protein
- make hypothesis on its function
Microarrays
- method to assess expression profiles
- level of expression of all the genes in a single experiment
- using a sequence of nucleic acids as a probe to identify and quantify the complementary strand
- microarray composed of a matrix of micro probes
- cDNA is labelled with fluorescent dyes
- sequence of each spot in known (possible to quantify level)
- after hybridization the microarray can be analyzed by a laser scanner
What are expression profiles?
- indicates overall level of expression of all the genes
- an expression profile is an “object” defined by a multi-variable vector
- each gene is an independent
- if all variables have the same value then the two profiles are identical
- we can measure the distance
- of two profiles -> euclidean distance
- many profiles -> clustering (unsupervised, supervised, hierarchical)
Hierarchical clustering
- method of cluster analysis which seeks to build a
hierarchy of clusters - Two main types:
- agglomerative, bottom-up, single clusters merged, O(n^3)
- divisive, top-down, whole set divided, O(2^n)
- merges and splits greedy manner
- results in dendrogram
- given complexity other efficien algorithms used (slink, clink) O(n^2)
unsupervised and supervised clustering
- unsupervised -> find hidden structure in unlabeled data
- no error to evaluate
- supervised -> find function from labeled data
- pair sample-label
- generalize to unseen instances
- SVM, cluter algorithms, kernel tricks
RNA-seq analyses, why?
- discover expressed genes in different tissues or conditions
- gene prediction and functional analysis
- comparison is very often the aim
- two main approaches:
- search for individual genes differentially expressed
- evaluate the whole expression profile
Can we discriminate which strand is transcribed ?
- we loose information about which one contains the sequence and which the complement
- directional cloning, method developed by LifeTechnologies
- RNA Ligase 2
- sticky adaptors
What is Agilent Bioanalyzer?
- a system performing fast and accurate automated electrophoresis
- “on chip” microelectrophoresis
- ribosomal RNA peaks should be very sharp showing very little degradation
- gives RIN (RNA integrity number), estimate of the quality
Affinity purification of polyA+ with magnetic beads
- interested only in mRNA<4%, remove the rest
* after it is good practice to run another Bioanalyzer
Covaris physical fragmentation of nucleic acids
- libraries must contain short inserts
- emulsion PCR and bridge PCR cannot process long fragments
- used to fragment the DNA (with sonification)
RNA fragmentation by Rnase III
- libraries must contain short inserts
- emulsion PCR and bridge PCR cannot process long fragments
- used to fragment the DNA (with enzymatic endonuclease digestion)
- enzyme RNase III
What are the e main advantages of RNA-seq?
- single-base resolution (unknown genes or exons)
- Differential splicing
- Gene prediction
- RNA editing
- direct measurement of the number of molecules
- microarrays give approximate values
From RNA-Seq reads to transcripts
- align-then-assemble approach
- aligns reads to the genome
- identify splicing events
- reconstructs transcripts from spliced alignments
- assemble-then-align approach
- assembles transcript sequences from reads [de-novo]
- transcripts are splice-aligned to the genome
- delineate intron and exon structures and variations between transcripts
- more sensitive, de novo works well for most abundant transcripts
- reads colored according to the transcript isoform
- protein dark colours
How to compare data from different experiments?
- FPKM, fragments per kilo-base of transcript per million mapped reads
- data normalized on total reads and lenght of transcripts
- normalization on the total number of reads non always ok
- normalization on the length of the transcript non always ok
RNA editing
- change some bases on the RNA (by specific enzymes)
* RNA-seq data may be very useful for a direct and global comparison between genome and transcriptome sequences
What is RNA-seq?
- RNA sequencing, also called whole transcriptome shotgun
sequencing (WTSS) - uses NGS to detect presence and quantity of RNA
- RNAseq has replaced microarrays for transcriptome
analyses because is more accurate- easily produce million of reads per sample