Statistics Flashcards
Differential gene expression analysis
Differential expression analysis means taking the normalised read count data and performing statistical analysis to discover quantitative changes in expression levels between experimental groups. For example, we use statistical testing to decide whether, for a given gene, an observed difference in read counts is significant, that is, whether it is greater than what would be expected just due to natural random variation.
Precision
Also called Positive Predicted Value = TP/(TP +FP)
Recall
also called sensitivity, hit rate or True Positive Rate = TP/P = TP/(TP + FN)
Accuracy
T/(T + F) = (TP + TN)/(TP + TN + FP + FN)
Specificity
Also called True Negative Rate: TN/N = TN/(TN + FP)
Overall error rate:
1 - accuracy
TP
True positives: positive cases identified correctly.
TN
True negatives: correct rejection.
FP
False positives: negative cases identified as positive.
FN
False negatives: positive cases identified as negative.
ROC
Receiver operating curve: Plot(Y axis: sensitivity, X axis: FPR) The perfect classifier is the function having the following shape: _ |
T-test
In many cases, we analyze microarrays with “one gene at a time” approach. That is, for each gene we would like to know if this gene is “different in the two classes”.
A classic analytical method to answer to this question is to perform Independent Student’s t-test (so called Welch test).
Null Hypothesis: Means of the two populations are equal, so the two groups are from the same populations.
Alternative Hypothesis: The mean of the two populations are un-equal and the two groups are from different populations.
Level of significance α is defined a priori and it is the risk we are prepared to take in rejecting H0 when it is in fact true.
Welch T-test is used to investigate the significance of the difference between the means of two populations. Use scipy library to perform the test. E.g.
t_value, p_value = stats.ttest_ind(np.array(Lum_A), np.array(Lum_B), equal_var=False)
Reject null hypothesis if the p-value is lower or equal to α
Bonferroni adjustament
When we perform multiple statistical tests, the overall probability in rejecting the null hypothesis when actually it is true is given by α x G, where G is the number of genes. This overall error is called Family-Wise Error Rate (FWER).
The Bonferroni adjustment that requires to select only genes for which p value ≤ α/G.