confidence interval for sample mean Flashcards

1
Q

Why do sample estimates vary?

A

Sample estimates from the same population vary because different random samples produce different values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can we assess if a sample represents the population?

A

If estimates from different samples are similar β†’ a particular estimate is likely close to the true parameter.

If estimates differ greatly β†’ difficult to assume any estimate is close to the true parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a sample estimate provide about the population parameter?

A

A sample estimate provides a point estimate of the true parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is uncertainty in a sample estimate represented?

A

Standard error (SE): SD of the sampling distribution

Confidence interval (CI): Range where the true parameter likely lies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a confidence interval

A

an interval (defined by a lower/upper limit) within which the true value of the population parameter is stated to lie with a specified confidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we transform X∼N(ΞΌ,Οƒ) into a standard normal variable?

A

Z = (X - ΞΌ) / Οƒ

This standardises 𝑋 to follow N(0,1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What percentage of
𝑍 values lie between -1.96 and 1.96 in a standard normal distribution?

A

95% of randomly generated values lie in this range:

Pr(βˆ’1.96≀Z≀1.96)=0.95

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens to the sampling distribution as the sample size increases?

A

The distribution becomes more symmetric (normal) around πœ‡.

Variability depends on population variability (𝜎) and sample size (𝑛).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is the standard error (SE) calculated?

A

SE(ΞΌ^)= Οƒ / sqrt(n)
​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the formula for the distribution of the sample mean?

A

ΞΌ^∼N(ΞΌ, (Οƒ^2/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we transform πœ‡^ into a standard normal variable?

A

Z= (ΞΌ^βˆ’ΞΌ) / (Οƒ/sqrtn)
​
​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the formula for a 95% confidence interval when 𝜎 is known?

A

ΞΌ^Β±1.96Γ—Οƒ/sqrtn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the Central Limit Theorem (CLT) state?

A

If sufficiently many samples are drawn, the sample mean follows a normal distribution, regardless of the population distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What if the population standard deviation (𝜎) is unknown?

A

Use the sample standard deviation (s) and the t-distribution instead of the normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is the confidence interval calculated when
𝜎 is unknown?

A

ΞΌ^Β±t(1βˆ’Ξ±/2,df=nβˆ’1) Γ— s/sqrtn
​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the key properties of the t-distribution?

A

Similar in shape to the normal distribution

Centered at zero.

Shape depends on degrees of freedom (df=nβˆ’1)

shape changes with sample size e.g. has heavier tails for small samples compared to the normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does the similarity between the 𝑑-distribution and the standard normal distribution depend on sample size

A

The 𝑑-distribution is more spread out for small 𝑛.

As 𝑛 increases, the 𝑑-distribution approaches the normal distribution.

For 𝑛=1000, 95% of the 𝑑-distribution falls within Β±1.96, similar to the normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the key differences between the normal and 𝑑-distributions?

A

The normal distribution has a fixed shape.

The 𝑑-distribution has heavier tails, especially for small sample sizes.

The 𝑑-distribution approaches the normal as 𝑛 increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How does the 95% coverage change between normal and 𝑑-distributions?

A

Normal: 95% falls within Β±1.96 standard deviations.

𝑑-distribution with df=10: Only 92% falls within Β±1.96, requiring Β±2.23 for 95% coverage.

20
Q

What is the formula for a confidence interval using the 𝑑-distribution?

A

ΞΌ^Β±(t(1βˆ’Ξ±/2,df=nβˆ’1)Γ—SE(ΞΌ^))

21
Q

What does each term in the confidence interval formula represent?

A

πœ‡^ : Sample mean (estimate of the true mean)

t(1βˆ’Ξ±/2,nβˆ’1): 𝑑-value for the desired confidence level

SE (ΞΌ^): Standard error of the estimate

22
Q

What two factors are needed to calculate a confidence interval?

A

Confidence level (choice of 𝛼)

Standard error (SE(ΞΌ^), which measures uncertainty)

23
Q

How do we find the
𝑑-multiplier for a 95% confidence interval?

A

Choose 𝛼=0.05

Compute 1βˆ’Ξ±/2=0.975.

Find 𝑑 such that 𝑃(𝑇≀𝑑)=0.975 for T∼tdf=nβˆ’1

24
Q

How does the choice of confidence level affect the confidence interval?

A

Higher confidence levels (e.g., 99%) β†’ wider intervals β†’ more likely to contain the true parameter.

Lower confidence levels (e.g., 90%) β†’ narrower intervals β†’ less likely to contain the true parameter.

Common choices: 90%, 95%, and 99%.

25
Q

What does β€œconfidence” mean in a confidence interval?

A

Confidence refers to a long-run process, not a single interval.

If we repeatedly take samples, 95% of confidence intervals will contain the true value (for a 95% CI).

If many researchers independently study the same population, 95% of them will capture the true parameter.

26
Q

How do you calculate the 95% confidence interval for a sample?
Sample Mean ΞΌ^=9.56
Sample SD s=1.13
Sample size 𝑛 = 20
t-multiplier=2.09

A

SE = 0.253
CI = (9.03,10.09)

27
Q

What is an empirical (bootstrap-based) confidence interval?

A

A non-parametric method for estimating confidence intervals when:

The sample size is small

The population is skewed

The normality assumption for sample means is questionable

28
Q

What are the 5 steps for generating a bootstrap confidence interval?

A

Randomly select 𝑛 values from the original sample with replacement.

Compute the sample mean for this bootstrap sample.

Repeat steps 1 & 2 many times (e.g., 1000 times).

Examine the distribution of bootstrap means.

Find the central 95% range (2.5th and 97.5th percentiles) to define the confidence interval.

29
Q

What are the 3 advantages of bootstrap confidence intervals?

A

No assumption of normality in the population.

Can estimate CIs for any statistic (mean, median, etc.).

Useful for small or skewed samples.

30
Q

What are the 2 disadvantages of bootstrap confidence intervals?

A

Computationally intensive (requires many resamples).

Coverage issues (may not always capture the true parameter correctly).

31
Q

How do we estimate the difference between two group means?

A

Difference(Aβˆ’B)= ΞΌ^A βˆ’ ΞΌ^B

Same parent population β†’ Difference is due to sampling variability.

Different parent populations β†’ True difference exists beyond sampling variability.

32
Q

How do we determine if the difference between two groups is meaningful?

A

If difference β‰ˆ 0, likely due to sampling variability.

If difference is large, likely a real difference between populations.

Confidence intervals help quantify uncertainty in the difference.

33
Q

What is the formula for a confidence interval for the difference between two group means?

A

(ΞΌ^1βˆ’ ΞΌ^2)Β±(t(1βˆ’Ξ±/2),dfΓ—se(ΞΌ^1βˆ’ ΞΌ^2))

34
Q

What does the standard error of the difference measure?

A

It quantifies the uncertainty in the difference between two group means by combining the variability of both samples

35
Q

What is the formula for the pooled standard deviation
𝑠𝑝 when assuming equal variances?

A

sp = sqrt[((n1 - 1)s1^2 + (n2βˆ’1)s2^2)) / (n1 + n2 -2)]

36
Q

How do you calculate the standard error of the difference in means when assuming equal variances?

A

se(ΞΌ^1βˆ’ ΞΌ^2) = sp sqrt[(1/n1) +(1/n2)]

37
Q

How do you determine the degrees of freedom when assuming equal variances?

A

df=(n1+n2βˆ’2)

38
Q

How is the 95% confidence interval for the difference in means constructed? (6)

A

Compute the sample means: πœ‡^1βˆ’πœ‡^2

Calculate the pooled standard deviation: sp
​
Find the standard error
𝑠𝑒(πœ‡^1βˆ’πœ‡^2)

Get the 𝑑-multiplier from the
𝑑-distribution.

Compute the margin of error: 𝑑×𝑠𝑒

Find the CI:
(πœ‡^1βˆ’πœ‡^2)Β±marginoferror

39
Q

Example: What is the 95% CI for the difference in group means given the following data?

Group 1: n1=10, ΞΌ^1=157.2,s1=6.349

Group 2: 𝑛2=15, πœ‡^2=171.4, 𝑠2=6.335

𝑑0.975,𝑑𝑓=23=2.069

A

(8.9,19.6)

40
Q

How do we interpret a confidence interval for the difference in means? (for a 95% CI of (8.9, 19.6)

A

A 95% CI of (8.9,19.6) means we are 95% confident that the true difference in population means is between 8.9 and 19.6. If the CI does not contain 0, it suggests a significant difference between groups.

41
Q

What formula is used for the standard error of the difference in means when variances are unequal?

A

se(ΞΌ^1βˆ’ ΞΌ^2) = sp sqrt[(1/n1) +(1/n2)]

42
Q

What is the hand-calculated formula for degrees of freedom when assuming unequal variances?

A

df=min(n1βˆ’1, n2βˆ’1)

43
Q

What is the Welch-Satterthwaite formula for degrees of freedom?

A

more precise but complex estimate.

𝑑𝑓𝑀 is smaller than the df under equal standard deviation, so for the same 𝛼
level, the 𝑑-multiplier will be larger.

done using R

44
Q

What 3 assumptions are made when calculating a confidence interval using the 𝑑-distribution?

A

Independence of data (within and between groups).

Normality of data (assess using histograms or Shapiro-Wilk test).

Equal or unequal variances (test using an 𝐹-test or variance test).

45
Q

When should we assume equal vs. unequal variances in a two-sample 𝑑-test?

A

Assume equal variances if the sample standard deviations are similar.

Assume unequal variances if standard deviations differ significantly (Welch’s 𝑑-test is more conservative).

Use an 𝐹-test or Levene’s test to formally compare variances.

46
Q

What is an empirical (bootstrap) confidence interval for a difference in means? (5)

A

A non-parametric method that does not assume normality. Steps:

Resample each group with replacement.

Compute the sample means for each resample.

Calculate the difference between means.

Repeat many times (e.g., 1000+ iterations).

Find the 2.5 and 97.5 percentiles of the distribution of differences.

47
Q

When is an empirical bootstrap CI useful?

A

When sample sizes are small.

When data are highly skewed.

When the normality assumption is questionable.

When computing a confidence interval for non-standard statistics (e.g., median).