Gene Expression Flashcards
Gene expression
What genes are in the genome and what proteins are produced when expressed ? What is happening in a particular cell ? How does it perform gene expression ? Altering in cancer ?
Microarray
Take mRNA of sample, extract the complementary DNA cDNA
The chip has complementary strands (known as probes)
Probe-target hybridization
The cDNA will bind to what is complementary to it => will stick to the corresponding gene features
Done with many copies and each spot measure a different gene
=> thousands of experiments in one go
Black: did not bind to RNA
Bright: measured to max, everything match
The brightness relates to much match there was between the sample and the gene
There can be 2 samples eg infected and uninfected that have color red and green and then can analyze the pattern on the micro array
Image analysis (determining intensity of each spot) -> quality control and normalization -> filtering -> statistical analysis -> pattern/pathway analysis -> results
Microarray correction
The brightness of one spot will color the background, this means it influence the intensity of the surrounding spots
So we subtract the intensity of the background
intensity = spot intensity - background intensity
Microarray quality check
detect chips with problems
low hybridization or staining problems
Microarray normalization
We can use and compare multiple microarrays
however, different ships may have different intensities.
Thus we need to correct and normalize.
Housekeeping genes
Total area
Median Fold Change
Quantile Normalization
Quantile Normalization
making all the profiles identical in distribution values. all the smallest values of the sample = x, all the second smalles = y and so on. Seems to be the best option
Median fold change
Assume most genes dont change. find normalization to factor such that most genes are at the same level. Take a reference for each sample: the median! Then with each median: median(sample gene/reference) for all i. This is the factor to apply on all the data.
Total area
the total of all variables per sample should be a set number. Sum the variable and divide by total then multiply by the set number. Problem with genes with large expression, can seem the other genes change. Will always give same result
Housekeeping genes
we find a set of genes that we believe won’t change. We normalize such that these are at the same level across the chips. Unreliable ! Housekeeping genes dont exist or are difficult to know. Can pipette the same gene across chips but the procedure should be exact …
Filtering
After normalization we remove the low expression genes. We assume this is noise. Pick a minimum value samples need to have.
If a spot is saturated to the max level we won’t know the true level of mRNA
Distribution expression values
There is usually 2 distributions 1 of the unexpressed genes and one of the expressed
want to remove the unexpressed, try to cut off where most samples are unexpressed.
RNA sequencing
alternative to microarrays
Bridge amplification
DNA fragments and adaptors
can measure all genes not just the known ones on the chip
accurate measurement that do not suffer from background brightness limitations
More complex and more costly
Statistical Analysis
After the filtering, correction and normalization.
We can perform the statistical analysis to find genes that have changed expression in the experiment.
Can look at fold change: how much more or less they are expressed
Can look at statistical significance: variance of change taken into account
interesting genes: high fold change and statistical significance. -> differentially expressed
Should do multiple testing correction to prevent false positive !!
Bonferroni correction adjust the p value to prevent the false positive. adjusted p = calculated p* number of tests done. Tends to be too strict !
Benjamini-Hochberg false discovery rate (FDR)
set the % of false positive we can tolerate.