DNA microarrays 1 Flashcards
Transcriptome
A complete set of transcripts encoded in the genome and their relative levels of expression in a particular cell or tissue type under defined conditions
-e.g. the transcriptome of a neuron is going to be different from the transcriptome of a heart cell
What 4 things can characterizing the transcriptome identify?
- Genes exhibiting cell and tissue-specific expression (e.g. lots of a specific mRNA in a kidney cell allows you to infer that the gene that codes for that mRNA is important for kidney function)
- Genes aberrantly expressed in cell and tissue disease (compare normal and disease transcriptomes)
- Genes expressed in response to environmental toxins and pharmaceutical compounds (mode of action and side effects)
- Can infer what molecular pathways the toxin is targetting (what genes are being turned on in response to environmental toxins and cells, e.g. if DNA repair RNA is upregulated in response to a toxin, then you know that the toxin is involved in DNA damage) - Genes expressed in response to pathogens (mode of infection and virulence)
- helps with therapeutics
Describe the significance of the analysis the transcriptome of males with autism, and how the transcriptome was analyzed
- Obtain blood transcriptomes of 104 ASD cases and 82 controls (all males)
- Found 55 genes differentially regulated as candidates to diagnose autism
- 68% accuracy for ASD identification with these 55 genes for males, poorly for females
- Blood test to detect autism may be possible
Northern blotting and steps (5)
Conventional method to detect RNA transcripts of a cell and tissue
1. RNA extraction and electrophoresis
2. Transfer RNA to membrane (Northern blotting)
3. RNA fixed to membrane with UV or heat
4. Labelled probes (radioactive RNA or DNA, most likely DNA because it’s more stable) hybridize with RNA on membrane
5. Visualization of labelled RNA on X-ray film (more probes bound to RNA = more radioactivity/darker spots = more RNA)
True or false: Northern blotting is always a practical method for characterizing transcriptoms
False.
- To characterize the transcriptome of a human cell or tissue type, you would have to run 25,000 northerns and use 25,000 different probes
What do DNA microarrays allow for?
- Allows the simultaneous monitoring of the expression (mRNA) level of every gene in an organism in response to genetic and environmental perturbation
- In a single experiment (two weeks), you can determine which genes in the genome are transcriptionally turned on or off
What is a DNA microarray?
DNA/gene chip that contains single-stranded probes (25-70 nucleotides) with sequence complementary to a specific gene/mRNA
- Each probe is present in many copies in a spot on the microarray to hybridize (complementary base pairing) to the probes
The intensity of the fluorescence is proportional to what?
The abundance of mRNA/cDNA that binds to the probe
What 3 components are required for microarray probes to have?
- Specificity: unique for each gene, no cross hybridization (want probes to be highly specific to they don’t bind to other RNA)
- Homogeneity: Bind to complementary DNA at same Tm for all probes (annealing temperature has to be lower than Tm, so annealing temp has to be the same among probes)
- Sensitivity: not form secondary structures (e.g. hairpins formed by repetitive sequences) that interfere with hybridizations
Describe the steps of the microarray procedure (4 steps)
- Isolate total mRNA from wildtype (control) and mutant/drug (experimental)
- Reverse transcribe (because RNA is very unstable and usually degraded by mRNases) and label cDNA with red (Cy5) and green (Cy3) fluorescent dyes. Different dyes for each treatment
- Mix cDNAs of wildtype and mutant and put onto microarray (competitive hybridization)
- Measure relative levels of RNA expression based on colours of microarray wells. Upregulated expression will show well with colour of treatment cDNA.
What are two ways that they get oligonucleotide probes on a matrix at such high densities?
- Ink-jet microarrays (Agilent)
- Ink-jet print-head uniformly deposits small, accurate volumes (picoliters) of nucleic acids building the 60-mer oligonucleotide probes one base at a time onto a 1’ X 3 glass slide. - Photolithographic microarrays (Affymetrix)
- Oligonucleotide probe synthesis on wafer chip using combination of photolithography and chemistry
- Photomask: opaque plate with holes that allow light to shine in specific locations on the silicon wafer
- Light removes blocking compound which prevents base addition to wafer
- Flood with chemical base (e.g. adenine) which attaches to unblocked area of wafer
-Repeat this process with blocking compound and new photomask
What are the 4 common ways to “label” nucleic acids with fluorescence
- Random priming of double-stranded DNA
- Poly-T primed cDNA synthesis
- Direct labelling of mRNA with fluorescent molecules
- Amplification by transcription
Describe random priming of double-stranded cDNA
Used when you don’t know the sequence
- Just like a PCR reaction but with labelled nucleotides
- One dNTP is labeled because it’s enough for signal detection (and to save costs)
Describe poly-T primed cDNA synthesis
Reverse transcription
- Make poly-T primer that anneals to poly A tail on mRNA
- Use labelled dNTPs when doing reverse transcription
- No amplification using this method.
Describe direct labelling of mRNA with fluorescent molecules
Transcribe a DNA molecule using fluorescent NTPs
Describe amplification by transcription (3 steps)
- mRNA is reverse transcribed. cDNA has a T7 promoter added to it (which helps with transcription)
- “Second strand” synthesis, making double-stranded cDNA. Now, the T7 promoter sequences are on both ends, which is necessary for amplification.
- Amplification of the double-stranded cDNA. T7 reaction contains labelled nucleotides
Cy5 dye
Excited with a 635 nm RED laser
Cy3 dye
Excited with a 532 nm GREEN laser
How is fluorescence intensity in microarrays detected?
Detected by a photomultiplier tube
How does microarray data initially look?
For each microarray, acquire two TIFF images (16-bit_ scanned with either the Cy5 (red) and Cy3 (green) channel.
- Channels are then merged
Describe image segmentation of microarrays
“Separating where the spot is and where the spot is not” = partition the image to determine which pixels constitute signal or background
- Use an inner circle to calculate signal value and pixels outside the outer circles as local background
- Background correction can also be blank spots or control spots of exogenous DNA
What is the problem with spatial image segmentation?
Sometimes inner circle is not small enough for tiny spots
Describe intensity-based image segmentation
Rank intensity of pixels from highest to lowest and take a cut-off equivalent to the approximate area of the spot = signal
- A single intensity threshold is applied across the entire image to classify pixels as either part of a spot (signal) or background.
- Can use a combination of this segmentation and spatial segmentation
How are mean, median, mode and total intensity of segmented signal (microarray spots) and background pixels determined?
In a text file
Signal intensity formula
Total spot intensity - background intensity
Which of the mean, median and mode is usually used to determine signal intensity? Why?
Median, because it is more robust to outliers
When comparing relative abundance of gene expression between two samples, what value should you take? Why?
First take the ratio of Cy5/Cy3 values (R/G), then log this to increase the symmetrical distribution of the data (upregulated and downregulated genes are treated equally)
What is microarray data normalization required for?
Required to correct for variations caused by:
- Unequal amounts of cDNA
- Distinct dye properties (e.g. one of the dyes might be more stable than the other)
- Differences in dye incorporation (e.g. one of the dyes might incorporate better than the other)
- Differences in scanning
What is the assumption with within array/single experiment normalization?
Most genes are not differentially regulated
Why does single microarray experiment have to be normalized?
Because data will likely show that a lot of spots shift away from the no change line
- Not possible because all the cells would die if this many genes were upregulated or downregulated
- Likely an error due to non-linear dye properties
What must you assume for global linear normalization?
Assume equal quantities of cDNA and total intensity of Cy3 and Cy5
Describe global linear normalization
- Normalization constant= (sum of Cy3)/(sum of Cy5)
- For each gene, multiply the Cy5 intensity by the normalization constant
- Only works partially because the relationship is not linear
What does a M/A (magnitude/amplitude) plot of microarray data allow for?
M= log2(R/G)
A= intensity (brightness) of microarray spots, 1/2log2(RG)
- M/A plot allows for detection of intensity-dependent effects on log expression ratios
Describe Global Lowess (locally weighted linear regression)
- Performs a series of local regressions in overlapping windows with a weighted average of neighbouring spots (curve fitting and correction)
- Each regression is combined to make the Lowess smooth curve (weighted average values: closer spots have greater weight than far-away spots)
Lowess correction
Subtracting the deviation/distance of the Lowess curve from the zero axis from the log ratios of each spot
Normalized log (R/G) formula and output
- the corrected log ratio
Formula: log (R/G) - Lowess correction
Output: That log ratios at all intensities have a mean of 0
What is the most straight-forward way to identify differentially expressed genes? What is the problem with this approach?
Have a fixed fold change cut-off (usually two fold)
- Problem: The variability of the log ratio is greater at lower intensities so at lower intensity spots, genes can be misidentified as differentially expressed. Also, at higher intensity spots, differentially-expressed genes can be missed.
Describe a Z-score transformation and what it’s used for
Measures the number of standard deviations a particular data point is from the mean/median
- Using a sliding window, calculate the local mean and standard deviations within a window surrounding each data point.
- Z score allows to determine a threshold intensity where spots are differentially regulated at the 95% confidence level