MARCO - measuring gene expression Flashcards
Human/ eukaryotic multicellular organisms have …
multiple cell types which are different in morphologies, shapes, functions, but all contain the same genome.
human genome
3 billion nucleotides, in which 2% are coding (exons,) accounting for 25-30k genes. These genes are not active at the same time in all cells – each cell type expresses & represses a specific set of genes (cell-type specific gene expression) to ensure the observed diversity
Cell-type specific gene expression
is possible due to regulation of gene expression by non- coding control elements, namely promotors and enhancers, that loop around DNA to interact either directly or via mediator proteins with transcription factors. This multi-subunit complex/ 3D interactions between promotor, enhancer, and a group of proteins serve to recruit RNA polymerase 2 to start RNA synthesis, thus expressing the gene.
Measuring Gene Expression
measuring how much a target gene is expressed in a cell type Can be done by:
* qPCR
* Microarrays
* Next gen sequencing
o RNA seq
o single cell RNA seq
quantitative PCR (qPCR) with reporter probe
RNA from cells are reverse transcribed into cDNA (complimentary DNA) which are regions of DNA that represent expressed genes & only contain exonic regions.
Fluorescent probe (has a fluorophore and a quencher attached to the 5’ and 3’ side) with sequence complimentary to a target gene is used.
During qPCR, the fluorescent probe will bind to the target cDNA, if present.
Forward and reverse primer is also added to recruit Taq polymerase for cDNA replication. Once Taq pol reaches the region of the probe, its 5’-3’ exonuclease activity will cause cleavage of the probe, separating the fluorophore from the quencher, which leads to the emission of a fluorescent signal.
qPCR -measuring level of gene expression
by quantifying the threshold cycle (Ct) - the number PCR cycles it takes for the fluorescent signal to be detected.
* It is the spot where the reaction curve intersects a threshold line that set above the level of background fluorescence. The intersection will occur at the beginning of the exponential phase of the reaction curve.
qPCR in multiple samples
Multiple genes/ multiple samples can be tested & compared using different colored dyes.
* Difference in expression levels between genes/ samples = ratio between their Ct value
The more cDNA of the target gene is present, the earlier the fluorescent signal be observed, corresponding to the level of the target gene expression.
Ex. Quantification of TREC in a group of 3 children with Down’s syndrome (DS) and in 4 healthy controls (each measure is performed in triplicate). qPCR shows that DS samples contain much lower expression of TREC.
qPCR normalization
Technical differences with sequencing or initial variations in amount of mRNA obtained from samples can result in inaccurate differences in gene expression between samples.
* Therefore, normalization must be performed to compensate for those differences and measure the actual change in transcription level.
* Normalization utilizes housekeeping genes which have stable expression across all cells (because they are crucial for cellular functions ex. actin, tubulin, ribosomal subunits – most common =18s RNA)
* Housekeeping genes must be included in qPCR to observe the difference in their Ct values between samples. The value used equalize the Ct values of housekeeping genes is then used to manipulate /normalize the samples’ data.
qPCR limitations
- qPCR is relatively fast and cheap
- BUT it is limited as to how many genes can be tested at one time (SO can’t possibly
perform PCR for all genes – only for a subset of 5-10 target genes) Therefore, qPCR cannot be used to have an overview of thousands of genes across samples.
Microarrays
A high throughput technology that allows for the detection of thousands of genes simultaneously
* Relies on measuring the base-pairing hybridization with probes for each gene
* Can measure:
o Differing expression of genes over time, between tissues and disease states
o Co-expression of genes
o Identification of complex genetic diseases
Microarray chip (Affy Genechip) contains multiple cells. Each cell contains millions of DNA strands complementary to a certain gene (differ between cells). Each strand is 16-20 bp long.
Microarray Steps
- RNA from cells are reverse transcribed into cDNA (complimentary DNA) which are DNA that represent expressed genes & exonic regions
- The cDNA are labelled with biotin
- Biotin-labelled cDNA are fragmented and loaded onto the microarray chip to hybridize with the probes. Different genes hybridize with probes in different cells.
- The microarray chip is then labeled with streptavidin – which has a very strong affinity to biotin.
- Binding of streptavidin to biotin will release light signal – based on the amount of light signal observed in each microarray chip cell, it corresponds to the amount cDNA present for that specific gene, indicating the level of gene expression.
Limitations of microarrays
The data is very “noisy” - expression levels are determined by a spot of light against a noisy background
* Probes are not available for all genes - Affy probes are only present for approx 75-80% of human genes
* Genes with very low expression may not be detected
* The data requires a large degree of statistical manipulation
* Most importantly: result only shows that a gene is expressed BUT gives no information
about which transcript. It cannot differentiate between isoforms of the same gene (transcripts of the same genes that differ in the amount of exons present – alternative splicing depending on what kind of protein the cell needs at that time – short or full length proteins)
microarrays - Results are given as
Signal: Expression measurement for the corresponding probe
Detection: Determine the absolute call for a measurement – tells of the gene is
A – absent
M – marginal
P – present
Next Generation Sequencing (method)
- Obtain fragments of DNA (or cDNA)
- Add adaptors to both ends of DNA fragments
- Attach fragments to flowcell via binding of an adaptor at 1 end with complimentary adaptors on flowcell
- The adaptor at another end will also bind to flowcell, bending the fragments into a bridge.
- PCR amplification is performed to increase the amount of each fragment that has attached to different regions of flowcell – forming their own clusters (cluster formation) - can have millions of fragments on a flowcell
- Another round of PCR is performed using labelled nucleotides (each with different colours ), and the signal is detected for each nucleotide added to obtain the Read - the sequence of each DNA fragment.
RNA-seq (method)
Uses next gen seq to obtain Reads of every gene expressed (cDNA)
* Reads are usually 75-150 bp long. The reads obtained can then be aligned with a reference genome to know which gene they belong to.
* A count profile is then computationally calculated for each read. Read counts are proportional to the gene expression level.
* Read counts for each gene in different samples (ex. control vs. cancer cell) can be compared to show the difference in gene expression level.