Lecture 15 - Gene Expression Statistics Flashcards
what is accuracy
how close something is to the real answer
what is precision
how close your replicates are (consistency)
what is good precision characterized by
reproducible results
what is good accuracy characterized by
measurements that correspond to an independently known result
what is the main goal of data preprocessing
- removing systematic bias and variation in the data due to non-biological factors
- preserving the variate in gene expression that occurs because of biologically relevant changes in transcription
what are two methods of global normalization
- Trimmed Means of Means (TMM)
- use housekeeping genes
what is Trimmed Mean of Means (TMM)
- remove genes with big difference (log-fold change) between samples, or very high or low expression
- adjust sample to a common mean after removing trimmed genes
why is log fold-change used for TMM [log(A/B)]
- proportional changes are generally more important that absolute changes in biological data
what are housekeeping genes
genes that are typically invariant
what is a plot commonly used to represent gene expression values from two samples
scatter plots
how do log transformations change a scatter plot of gene expression
it results in a more even distribution of data points and the variance is more uniform
what is inferential statistics used for
used to make inferences about a population from a sample
what type of statistics is hypothesis testing a form of
inferential statistics
what is the null hypothesis for gene expression
there is no difference in gene expression between samples
what is the alternate hypothesis for gene expression
there is a difference in expression between samples
how do we determine whether or not to reject the null hypothesis
with statistical tests
what is type I error of hypothesis testing
a false positive; the null hypothesis is rejected but it is actually true
what is type II error in hypothesis testing
a false negative; we fail to reject the null hypothesis but it is actually false
what can p-values be used to estimate in terms of hypothesis testing
type I errors, false positives
what does it mean if the p-value is less than a threshold α
the chance of a false positive is less than α
what is a common form of hypothesis testing
the t-test
what is the issue with p-values and multiple hypothesis testing
you would expect to see 5% identified at the p < 0.05 level by chance alone
what is the Bonferroni correction
the level of statistical significance divided by the number of measurements
what does the new threshold created by the Bonferroni correction mean
the chance of any false positive being present is < 0.05