Statistics Flashcards

Question 1

Q

Differential gene expression analysis

Answer

A

Differential expression analysis means taking the normalised read count data and performing statistical analysis to discover quantitative changes in expression levels between experimental groups. For example, we use statistical testing to decide whether, for a given gene, an observed difference in read counts is significant, that is, whether it is greater than what would be expected just due to natural random variation.

Question 2

Q

Precision

Answer

A

Also called Positive Predicted Value = TP/(TP +FP)

Question 3

Q

Recall

Answer

A

also called sensitivity, hit rate or True Positive Rate = TP/P = TP/(TP + FN)

Question 4

Q

Accuracy

Answer

A

T/(T + F) = (TP + TN)/(TP + TN + FP + FN)

Question 5

Q

Specificity

Answer

A

Also called True Negative Rate: TN/N = TN/(TN + FP)

Question 6

Q

Overall error rate:

Answer

A

1 - accuracy

Question 7

Q

TP

Answer

A

True positives: positive cases identified correctly.

Question 8

Q

TN

Answer

A

True negatives: correct rejection.

Question 9

Q

FP

Answer

A

False positives: negative cases identified as positive.

Question 10

Q

FN

Answer

A

False negatives: positive cases identified as negative.

Question 11

Q

ROC

Answer

A

Receiver operating curve:
Plot(Y axis: sensitivity, X axis: FPR)
The perfect classifier is the function having the following shape: 
_
|

Question 12

Q

T-test

Answer

A

In many cases, we analyze microarrays with “one gene at a time” approach. That is, for each gene we would like to know if this gene is “different in the two classes”.

A classic analytical method to answer to this question is to perform Independent Student’s t-test (so called Welch test).

Null Hypothesis: Means of the two populations are equal, so the two groups are from the same populations.

Alternative Hypothesis: The mean of the two populations are un-equal and the two groups are from different populations.

Level of significance α is defined a priori and it is the risk we are prepared to take in rejecting H0 when it is in fact true.

Welch T-test is used to investigate the significance of the difference between the means of two populations. Use scipy library to perform the test. E.g.
t_value, p_value = stats.ttest_ind(np.array(Lum_A), np.array(Lum_B), equal_var=False)

Reject null hypothesis if the p-value is lower or equal to α

Question 13

Q

Bonferroni adjustament

Answer

A

When we perform multiple statistical tests, the overall probability in rejecting the null hypothesis when actually it is true is given by α x G, where G is the number of genes. This overall error is called Family-Wise Error Rate (FWER).

The Bonferroni adjustment that requires to select only genes for which p value ≤ α/G.