Transcriptomics Flashcards
1
Q
What is transcriptomics?
A
- systematic analysis of transcripts
- length of RNA or DNA that has been transcribed respectively from a DNA or RNA template
- identify the genes that are active in a particular condition
- measure the gene level of expression
- Two main approaches:
- EST sequencing (Expressed Sequence Tags)
- Full length sequencing
2
Q
EST sequencing
A
- also called single pass sequencing
- resulting sequences obtained by a single read
- no confirmation of correctness
- random sequencing of cDNA libraries
- allowed to discover many transcripts or fragment of transcripts
- often poor quality of reads
- from the number of reads aligned it was possible to estimate the expression level of all the genes
3
Q
Full length sequencing
A
- more difficult than EST sequencing
- full length sequence of that transcript
- able to infer the full length of the protein
- make hypothesis on its function
4
Q
Microarrays
A
- method to assess expression profiles
- level of expression of all the genes in a single experiment
- using a sequence of nucleic acids as a probe to identify and quantify the complementary strand
- microarray composed of a matrix of micro probes
- cDNA is labelled with fluorescent dyes
- sequence of each spot in known (possible to quantify level)
- after hybridization the microarray can be analyzed by a laser scanner
5
Q
What are expression profiles?
A
- indicates overall level of expression of all the genes
- an expression profile is an “object” defined by a multi-variable vector
- each gene is an independent
- if all variables have the same value then the two profiles are identical
- we can measure the distance
- of two profiles -> euclidean distance
- many profiles -> clustering (unsupervised, supervised, hierarchical)
6
Q
Hierarchical clustering
A
- method of cluster analysis which seeks to build a
hierarchy of clusters - Two main types:
- agglomerative, bottom-up, single clusters merged, O(n^3)
- divisive, top-down, whole set divided, O(2^n)
- merges and splits greedy manner
- results in dendrogram
- given complexity other efficien algorithms used (slink, clink) O(n^2)
7
Q
unsupervised and supervised clustering
A
- unsupervised -> find hidden structure in unlabeled data
- no error to evaluate
- supervised -> find function from labeled data
- pair sample-label
- generalize to unseen instances
- SVM, cluter algorithms, kernel tricks
8
Q
RNA-seq analyses, why?
A
- discover expressed genes in different tissues or conditions
- gene prediction and functional analysis
- comparison is very often the aim
- two main approaches:
- search for individual genes differentially expressed
- evaluate the whole expression profile
9
Q
Can we discriminate which strand is transcribed ?
A
- we loose information about which one contains the sequence and which the complement
- directional cloning, method developed by LifeTechnologies
- RNA Ligase 2
- sticky adaptors
10
Q
What is Agilent Bioanalyzer?
A
- a system performing fast and accurate automated electrophoresis
- “on chip” microelectrophoresis
- ribosomal RNA peaks should be very sharp showing very little degradation
- gives RIN (RNA integrity number), estimate of the quality
11
Q
Affinity purification of polyA+ with magnetic beads
A
- interested only in mRNA<4%, remove the rest
* after it is good practice to run another Bioanalyzer
12
Q
Covaris physical fragmentation of nucleic acids
A
- libraries must contain short inserts
- emulsion PCR and bridge PCR cannot process long fragments
- used to fragment the DNA (with sonification)
13
Q
RNA fragmentation by Rnase III
A
- libraries must contain short inserts
- emulsion PCR and bridge PCR cannot process long fragments
- used to fragment the DNA (with enzymatic endonuclease digestion)
- enzyme RNase III
14
Q
What are the e main advantages of RNA-seq?
A
- single-base resolution (unknown genes or exons)
- Differential splicing
- Gene prediction
- RNA editing
- direct measurement of the number of molecules
- microarrays give approximate values
15
Q
From RNA-Seq reads to transcripts
A
- align-then-assemble approach
- aligns reads to the genome
- identify splicing events
- reconstructs transcripts from spliced alignments
- assemble-then-align approach
- assembles transcript sequences from reads [de-novo]
- transcripts are splice-aligned to the genome
- delineate intron and exon structures and variations between transcripts
- more sensitive, de novo works well for most abundant transcripts
- reads colored according to the transcript isoform
- protein dark colours