Chapter 21: Comparing Means Flashcards
Define ‘Two-sample t methods’.
Two-sample t methods allow us to draw conclusions about the difference between the means of two quantitative populations, based on independent samples. The unpooled two-sample methods make relatively few assumptions about the underlying populations, so they are usually the method of choice for comparing two sample means. However, the Student’s t-models are only approximations for their true sampling distributions, and there is a special rule for estimating degrees of freedom.
Define ‘Two-sample t-interval for the difference between means’.
A confidence interval for the difference between the means of two independent groups found as
(y-bar 1 - y-bar 2) ± t*(df) x SE(y-bar 1 - y-bar 2)
where
SE(y-bar 1 - y-bar 2) = sqrt( (s1^2 / n1) + (s2^2 / n2) )
and the number of degrees of freedom is given by a special formula. (IN CLASS, USED SMALLEST N-1 BETWEEN TWO SAMPLES)
Define ‘Two-sample t-test for the difference between means’.
A hypothesis test for the difference between the means of two independent groups. It tests the null hypothesis
H0: μ1 - μ2 = Δ0
where the hypothesized difference, Δ0, is almost always 0, using the statistic
t = ((y-bar 1 - y-bar 2) - Δ0) / SE( y-bar 1 - y-bar 2)
with the number of degrees of freedom given by the special formula.
Define ‘Pooled-t methods’.
Pooled-t (also called pooled two-sample t) methods provide inferences about the difference between the means of two independent populations under the assumption that both populations have the same standard deviation. When the assumption is justified, pooled-t methods generally produce slightly narrower confidence intervals and more powerful significance tests than unpooled two-sample t methods. When the assumption is not justified, they generally produce worse results, giving inaccurate or wrong conclusions.
Define ‘Pooled t-test’.
A hypothesis test for the difference in the means of two independent groups when we are willing and able to assume that the variances of the groups are equal. It tests the null hypothesis
H0: μ1 - μ2 = Δ0
where the hypothesized difference, Δ0, is almost always 0, using the statistic
t = ((y-bar 1 - y-bar 2) - Δ0) / SEpooled( y-bar 1 - y-bar 2)
where the pooled standard error is defined as for the pooled interval and the degrees of freedom is (n1 - 1) + n2 - 1).
Define ‘Pooled t-interval’.
A confidence interval for the difference between the means of two independent groups used when we are willing and able to make the additional assumption that the variances of the groups are equal. It is found as
(y-bar 1 - y-bar 2) ± t*(df) x SEpooled(y-bar 1 - y-bar 2)
where
SEpooled(y-bar 1 - y-bar 2) = s(pooled) x sqrt( (1 / n1) + (1 / n2) )
the pooled variance is
(given in formula sheet…)
and the number of degrees of freedom is (n1 - 1) + (n2-1).
Define ‘Pooling’.
We may sometimes combine, or pool, data from 2 or more populations to estimate a parameter (such as a common variance) when we are willing to assume that the estimates parameter is the same in both populations. Using more data may lead to a more reliable estimate. However, pooled estimates are appropriate only when the required assumptions are believed to be (nearly) true.
What are the assumptions and conditions?
- Independence Assumption (Randomization Condition, 10%…)
- Normal Population Assumption (Nearly Normal Condition)
- Independent Groups Assumption - most important!
When does a pooled t-test (equal variance) make particular sense?
For randomized experiments in which the randomization has produced groups with equal variance to start with and the null hypothesis is that a treatment under study has no effect.
Are the methods in this chapter appropriate for paired or matched data?
No.
Is the one-sample or two-sample mean test more robust?
Two-sample. (can handle skewness or deviations from normal for smaller sample sizes, 10 for relative skewness and 20 for more skewed vs. ? for one-sample)
What are the conditions for performing a pooled test (from class)?
- n1~=n2
- n1, n2>15
- largest S/smallest S < 2
Text does not like pooled…
Talks about it being more precise by producing larger df, which is more useful for small sample sizes, exactly when its harder to decide if its useful.