Lecture 7 Flashcards

Statistical Assessment

1
Q

What is a test statistic, and how is it used in hypothesis testing?

A

A test statistic is a single value summarizing the data to capture a trend. The more extreme the test statistic, the stronger the evidence against the null hypothesis.

Example:
For the yeast dataset, the test statistic could be the difference in median growth rates between the lab and wild genotypes:
T=medianWild−medianLabT=medianWild​−medianLab​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the null hypothesis (H0)?

A

the null hypothesis is a skeptical position that assumes no relationship between two variables.

Example:
For the yeast dataset, the null hypothesis is: “There is no difference in growth rate between the lab and wild yeast genotypes.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a P-value?

A

The P-value is the probability of obtaining a test statistic as extreme as or more extreme than the observed one, assuming the null hypothesis is true.

Right-tail Example:
For a right-tail test:
P=p(T≥Tobs∣H0)P=p(T≥Tobs​∣H0​)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are common misconceptions about P-values?

A
  1. The P-value is NOT the probability that the null hypothesis is true.
  2. The P-value is NOT the probability of the observed test statistic occurring under H0.
  3. “Absence of evidence is not evidence of absence.”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is permutation testing?

A

R Code Example code for permutation test

Permutation testing assesses statistical independence between two variables by randomly shuffling the data and recalculating the test statistic to simulate the null hypothesis.

Steps:

1. Shuffle the data many times.
2. Calculate a test statistic for each permutation.
3. Compare the observed statistic to the null distribution.

perm_test <- function(data, n_permutations = 1000) {
observed <- data$growth_rate
null_distribution <- numeric(n_permutations)

for (i in 1:n_permutations) {
shuffled <- sample(observed)
group1 <- shuffled[1:2]
group2 <- shuffled[3:4]
null_distribution[i] <- median(group2) - median(group1)
}
null_distribution
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a confidence interval (CI)?

A

A confidence interval estimates a range of values that, with a specified probability (e.g., 95%), contains the true population parameter.

Formal Definition:
A 95% CI means that if we repeated the experiment many times, 95% of the intervals would contain the true parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is bootstrap resampling?

A

Bootstrap resampling involves drawing random samples with replacement from the original data to estimate the variability of a parameter and construct confidence intervals.

Example R Code:

Bootstrap CI example
bootstrap_ci <- function(data, n_resamples = 1000, alpha = 0.05) {
n <- nrow(data)
bootstrap_estimates <- numeric(n_resamples)

for (i in 1:n_resamples) {
sample_indices <- sample(1:n, size = n, replace = TRUE)
bootstrap_sample <- data[sample_indices, ]
lab_group <- bootstrap_sample[bootstrap_sample$genotype == “Lab”, “growth_rate”]
wild_group <- bootstrap_sample[bootstrap_sample$genotype == “Wild”, “growth_rate”]
bootstrap_estimates[i] <- median(wild_group) - median(lab_group)
}

lower <- quantile(bootstrap_estimates, alpha / 2)
upper <- quantile(bootstrap_estimates, 1 - alpha / 2)
c(lower, upper)
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How are confidence intervals related to hypothesis testing?

A

If the confidence interval for a parameter (e.g., mean or median difference) does not include the null hypothesis value (e.g., 0), we reject the null hypothesis at the significance level αα.

Example:
If the 95% CI for the difference in yeast growth rates is (1.5, 3.0), we can reject the null hypothesis that there is no difference at the 5% significance level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some limitations of resampling methods?

A
  1. Computational Intensity: Resampling methods can be computationally expensive.
  2. Independence Assumption: The i.i.d. (independent and identically distributed) assumption may not always hold in practice.
  3. Misuse of P-Values: Researchers may misuse P-values by conducting multiple tests and only reporting significant results (p-hacking).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What key concepts should you understand after this lecture?

A

Statistical significance and hypothesis testing concepts (test statistic, null hypothesis, P-value).
Correct interpretation of P-values.
Conducting hypothesis testing using permutation.
Constructing and interpreting bootstrap confidence intervals.
Recognizing potential misuses of statistical tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly