Transcriptomics Flashcards

Question 1

Q

What is transcriptomics?

Answer

A

systematic analysis of transcripts
- length of RNA or DNA that has been transcribed respectively from a DNA or RNA template
- identify the genes that are active in a particular condition
- measure the gene level of expression
Two main approaches:
- EST sequencing (Expressed Sequence Tags)
- Full length sequencing

Question 2

Q

EST sequencing

Answer

A

also called single pass sequencing
- resulting sequences obtained by a single read
- no confirmation of correctness
random sequencing of cDNA libraries
allowed to discover many transcripts or fragment of transcripts
often poor quality of reads
from the number of reads aligned it was possible to estimate the expression level of all the genes

Question 3

Q

Full length sequencing

Answer

A

more difficult than EST sequencing
full length sequence of that transcript
- able to infer the full length of the protein
- make hypothesis on its function

Question 4

Q

Microarrays

Answer

A

method to assess expression profiles
- level of expression of all the genes in a single experiment
using a sequence of nucleic acids as a probe to identify and quantify the complementary strand
microarray composed of a matrix of micro probes
- cDNA is labelled with fluorescent dyes
- sequence of each spot in known (possible to quantify level)
after hybridization the microarray can be analyzed by a laser scanner

Question 5

Q

What are expression profiles?

Answer

A

indicates overall level of expression of all the genes
an expression profile is an “object” defined by a multi-variable vector
each gene is an independent
- if all variables have the same value then the two profiles are identical
we can measure the distance
- of two profiles -> euclidean distance
- many profiles -> clustering (unsupervised, supervised, hierarchical)

Question 6

Q

Hierarchical clustering

Answer

A

method of cluster analysis which seeks to build a
hierarchy of clusters
Two main types:
- agglomerative, bottom-up, single clusters merged, O(n^3)
- divisive, top-down, whole set divided, O(2^n)
merges and splits greedy manner
results in dendrogram
given complexity other efficien algorithms used (slink, clink) O(n^2)

Question 7

Q

unsupervised and supervised clustering

Answer

A

unsupervised -> find hidden structure in unlabeled data
- no error to evaluate
supervised -> find function from labeled data
- pair sample-label
- generalize to unseen instances
SVM, cluter algorithms, kernel tricks

Question 8

Q

RNA-seq analyses, why?

Answer

A

discover expressed genes in different tissues or conditions
- gene prediction and functional analysis
comparison is very often the aim
two main approaches:
- search for individual genes differentially expressed
- evaluate the whole expression profile

Question 9

Q

Can we discriminate which strand is transcribed ?

Answer

A

we loose information about which one contains the sequence and which the complement
directional cloning, method developed by LifeTechnologies
- RNA Ligase 2
- sticky adaptors

Question 10

Q

What is Agilent Bioanalyzer?

Answer

A

a system performing fast and accurate automated electrophoresis
- “on chip” microelectrophoresis
ribosomal RNA peaks should be very sharp showing very little degradation
gives RIN (RNA integrity number), estimate of the quality

Question 11

Q

Affinity purification of polyA+ with magnetic beads

Answer

A

interested only in mRNA<4%, remove the rest

* after it is good practice to run another Bioanalyzer

Question 12

Q

Covaris physical fragmentation of nucleic acids

Answer

A

libraries must contain short inserts
- emulsion PCR and bridge PCR cannot process long fragments
used to fragment the DNA (with sonification)

Question 13

Q

RNA fragmentation by Rnase III

Answer

A

libraries must contain short inserts
- emulsion PCR and bridge PCR cannot process long fragments
used to fragment the DNA (with enzymatic endonuclease digestion)
- enzyme RNase III

Question 14

Q

What are the e main advantages of RNA-seq?

Answer

A

single-base resolution (unknown genes or exons)
- Differential splicing
- Gene prediction
- RNA editing
direct measurement of the number of molecules
- microarrays give approximate values

Question 15

Q

From RNA-Seq reads to transcripts

Answer

A

align-then-assemble approach
- aligns reads to the genome
- identify splicing events
- reconstructs transcripts from spliced alignments
assemble-then-align approach
- assembles transcript sequences from reads [de-novo]
- transcripts are splice-aligned to the genome
- delineate intron and exon structures and variations between transcripts
- more sensitive, de novo works well for most abundant transcripts
reads colored according to the transcript isoform
- protein dark colours

Question 16

Q

How to compare data from different experiments?

Answer

A

FPKM, fragments per kilo-base of transcript per million mapped reads
- data normalized on total reads and lenght of transcripts
normalization on the total number of reads non always ok
normalization on the length of the transcript non always ok

Question 17

Q

RNA editing

Answer

A

change some bases on the RNA (by specific enzymes)

* RNA-seq data may be very useful for a direct and global comparison between genome and transcriptome sequences

Question 18

Q

What is RNA-seq?

Answer

A

RNA sequencing, also called whole transcriptome shotgun
sequencing (WTSS)
uses NGS to detect presence and quantity of RNA
RNAseq has replaced microarrays for transcriptome
analyses because is more accurate
- easily produce million of reads per sample