Bioinformatics 11: Gene expression and regulation Flashcards
What is the transcriptome and what does it include?
The transcriptional output of a genome
includes ribosomal RNA, messenger RNA, transfer RNA and regulatory RNAs
Translation of the transcriptome generates the proteome
What is gene expression data and how is it obtained?
Expression data refers to RNA:
- distribution
- abundance
Obtained through various experimental means, usually Reverse Transcription
Traditional molecular biological techniques for gene-by-gene analysis?
Northern Blotting
In situ hybridisation
RT-PCR (Reverse transcription- PCR)
Describe the process of northern blotting
Northern blotting: involves extracting RNA and comparing transcript abundance from different samples
- RNA separated by size via electrophoresis + blotted onto membrane
- Specific genes detected by hybridisation of radio-labelled RNA or DNA probe and developed on film
- Known expressed gene (e.g. Beta-actin) used as loading control
- Relative quantitation -> Comparison to loading control
Describe process of in-situ hybridisation
in-situ hybridisation: allows detection directly in tissues
- Sections (thin slices) or permeabilised tissues enable probe access
- Multiplexing possible via fluorescence detection (antibody amplification) -> FISH (Qualitative)
What are expressed-sequence tags (ESTs)?
Expressed sequence tag = short sub-sequence of a cDNA sequence. Represent a portion of an expressed gene
ESTs may be used to identify gene transcripts, useful in gene sequence determination
What is the purpose of Unigene?
Unigene collates EST data
- all-by-all sequence comparisons that identify overlapping ESTs
ESTs organised into clusters - each represent one unique expressed human gene
Allows comparing of gene expression via Digital Differential Display
What is Digitial Differential DIsplay (DDD) and what is it used for?
DDD compares the presentation of Unigene clusters in multiple cDNA / EST libraries
Allows for analysis of significant differences in gene expression
Particularly useful in disease vs ‘normal’ comparison
Pros and cons of EST analysis
Pros - All genes analysed including novel transcript variants - Fast and CHeap - EST clones available to community (I.M.A.G.E Consortium)
Cons
- Random sequencing strategy - only useful for high abundance transcripts
- some Data quality poor
- genomic contamination of cDNA can occur
- stats tools have low sensitivity, can only detect large differences in expression
What is the purpose of ‘Serial analysis of gene expression’ (SAGE)?
Technique developed to further quantify ESTs
- increases efficiency of EST profiling
cDNA is synthesised from mRNA , cut into short (10-17bp) fragments with enzymes and concatenated (joined together)
Each molecule is included in concatamer at a rate proportional to abundance
Qualitative (presence/absence) and quantitative (count of tags) data
What database collates SAGE data?
SAGEgenie
What is a microarray? How is gene expression visualised using them?
Patches (features) of DNA molecules on a glass/silicon support
Gene expression visualised via hybridisation of fluorescently labelled cDNA/mRNA
- data collected by fluorescence microscopy scanning
2 main types of micro array and their differences?
Spotted array
- small ‘spots’ regions (cDNA) of varying intensities of fluorescence represent levels of gene expression
- (low density, ~ 7000 spots per 2cm^2)
In situ synthesised DNA array
- features (oligonucleotide arrays) -> synthesis by photolithography, one nucleotide per spot
- (high density, ~250,000 features per 2cm^2)
Problems with microarray data - how do scientists make sense of it?
Problems
Large amounts of information, few samples, many genes (sparse data)
+ changes in gene expression is correlated
Solution
- Cluster genes / samples by expression patterns, interactions, regulation etc. across samples/genes e.g. using Volcano plots to visualise data
Volcano plots show what?
Volcano plots identify genes with particular fold change and levels of statistical significance