Reading Quiz 13 Flashcards by Kate Lester

two-sample problem characteristics

the goal of inference is to compare the response to two treatments or to compare characteristics of two populations
have separate sample from each treatment of population
responses of each group are independent of those in other group

How well did you know this?

Not at all

Perfectly

conditions for two-sample problem test comparing two means

two independent SRSs, each drawn from a normally distributed population (or could come from same population so clearly needs to be normally distributed)

How well did you know this?

Not at all

Perfectly

significance tests and confidence intervals for the difference between the means μ1 and μ2 of two normal populations start from the difference…

xbar1 - xbar2 between the two sample means
due to central limit theorem, resulting procedures are approximately correct for other population distributions when sample sizes are large

How well did you know this?

Not at all

Perfectly

two-sample z test for means

draw independent SRSs of sizes n1 and n2 from two normal populations with parameters μ1, σ1, μ2, σ2
statistic has standard normal distribution

How well did you know this?

Not at all

Perfectly

two-sample z statistic for means

z = ( (xbar1 - xbar2) - (μ1 - μ2) ) / (sqrt( (sigma 2 1 / n1) + (sigma 2 2 / n2) ) )

How well did you know this?

Not at all

Perfectly

two-sample t statistic for means

t = ( (xbar1 - xbar2) - (μ1 - μ2) ) / (sqrt( (s 2 1 / n1) + (s 2 2 / n2) ) )
doesn’t exactly have a t distribution; good approximations are available with calculators

How well did you know this?

Not at all

Perfectly

for conservative inference procedures to compare μ1 and μ2

use the two-sample t statistic for means with the t(k) distribution
number of degrees of freedom, k, is the smaller of n1 - 1 and n2 - 1

How well did you know this?

Not at all

Perfectly

for more accurate probability values

use the t(k) distribution with degrees of freedom estimated from the data

How well did you know this?

Not at all

Perfectly

significance tests for Ho: μ1 = μ2 are based on

t = (xbar1 - xbar2) / (sqrt( (s 2 1 / n1) + (s 2 2 / n2) ) )

How well did you know this?

Not at all

Perfectly

significance tests for Ho: μ1 = μ2 have a true p-value

no higher than that calculated using the conservative degrees of freedom

How well did you know this?

Not at all

Perfectly

the level C confidence interval for μ1 - μ2 given by

(xbar1 - xbar2) ± t*((sqrt( (s 2 1 / n1) + (s 2 2 / n2) )

has confidence level at least C if use the more conservative number of degrees of freedom

How well did you know this?

Not at all

Perfectly

when we want to compare the proportions p1 and p2 of successes in two populations, the comparison is based on the difference

phat1 - phat2 between the sample proportions of successes. when the sample sizes n1 and n2 are large enough, we can use z procedures because the sampling distribution of phat1 - phat2 is close to normal

How well did you know this?

Not at all

Perfectly

approximate level C confidence interval for p1 - p2

(phat1 - phat2) ± z* (sqrt ( (phat1qhat1/n1) + (phat2qhat2/n2)))

How well did you know this?

Not at all

Perfectly

significance tests for Ho: p1 = p2 use the

combined sample proportion and the z statistic for a two-sample z test for proportions
p-values can be determined using the standard normal table

How well did you know this?

Not at all

Perfectly

combined sample proportion

phat c = (count of successes in both samples combined)/ (count of individuals in both samples combined) = (X1 + X2) / (n1 + n2)

How well did you know this?

Not at all

Perfectly

two-sample z test for proportions

z = ( (phat1 - phat2) - (pnot1 - pnot2) ) / (sqrt (phatcqhatc * ((1/n1) + (1/n2))) )

How well did you know this?

Not at all

Perfectly

significance tests for Ho: p1 - p2 = 0 are based on

z = (phat1 - phat2) / (sqrt (phatcqhatc * ((1/n1) + (1/n2))) )

how to check normality conditions for two-sample proportion confidence interval

n1phat1, n1qhat1, n2phat2, n2qhat2 are all greater than or equal to 5 (or 10)

how to check normality conditions for two-sample proportion significance test

n1phatc, n1qhatc, n2phatc, n2qhat2 are all greater than or equal to 5 (or 10)

One researcher randomly samples two groups from a population, and gives training to one and not the other. The researcher uses a t procedure to compare the test scores of the two groups. Another researcher samples a group from the population, and gives a test to the group two times, once before training and once after. The researcher uses a t procedure to compare the results after testing with those before testing. How are these two situations different, and what different statistical procedures should they result in?

A. In the first case, the samples are independent of one another, and in the second, they are not. So in the first case, you use a 2 sample t to study the difference in the means. In the second case, you create a new variable, the post-score minus pre-score, and use a 1-sample t to study the mean of the differences (this is a paired t test).

Suppose someone were to draw many pairs of samples from two populations, and compute the difference between the sample means for each pair. What would the mean of this difference approach as the number of samples drawn approached infinity?

the difference in population means

The fact that the mean of the difference in sample means approaches the difference in population means as the number of samples gets larger is a long way of saying that the difference in sample means is a(n) ____ estimator of the difference in population means.

unbiased

True or False: just as the difference in sample means estimates the difference in population means, the difference in sample standard deviations estimates the population standard deviation of the difference between two means.

A. This is a triple false! First, what you would combine would be variances, not standard deviations. Second, to find the variance of the difference between two random variables you add the variances; you don’t subtract them. Third, the sample variances would have to be divided by n to estimate the variance of the sample mean.

True or False: the variance of the difference between two population means is estimated by s12/n1 + s22/n2, where s1 and s2 are the sample standard deviations (and thus s12 and s22 are the sample variances) and where
n1 and n2 are the sample sizes.

true

When the standard deviations of the two populations you are sampling from are different, why does the difference of the means of two independent samples not exactly follow the t-distribution?

A. Because there are two population standard deviations replaced by the sample standard deviations in the formula, not just one.

7. When the sample sizes for the two samples are different, how many degrees of freedom do you use for t procedures? Please explain two options.

A. One option is to use n-1, where n is the smaller of the two samples. The other (and the one almost always used in research) is to let the calculator compute a non-integer degrees of freedom according to a more complicated formula (page 659), which does not need to be memorized!

What is the formula for calculating a confidence interval for the difference between two means when the population standard deviation is not known?

(xbar1 - xbar2) ± t*((sqrt( (s 2 1 / n1) + (s 2 2 / n2) ) | I THINK NOT POSITIVE CHECK ANSWER KEY

What is the formula for calculating the test statistic for the difference between two means when the population standard deviation is not known?

t = ( (xbar1 - xbar2) - (μ1 - μ2) ) / (sqrt( (s 2 1 / n1) + (s 2 2 / n2) ) ) is the correct answer but t = (xbar1 - xbar2) / (sqrt( (s 2 1 / n1) + (s 2 2 / n2) ) ) is perfectly acceptable

For using t procedures with means of independent samples, an excellent approximation is achieved by using the t distribution with a not-necessarily-integer degrees of freedom computed by the formula on page 659. This approximation is quite accurate when the sample size of both samples is what?

five or larger

When software gives you a choice between assuming, or not assuming (pooling or not pooling), equal variances for the two populations whose means you are comparing with a t procedure, which choice should you generally make?

A. Not to assume equal variances, because this assumption is very difficult to check. Aka don’t pool!

Using symbols: when we want to compare the two population means, we either give a confidence interval for ______ or test the hypothesis of _______ (supply the null hypothesis).

A. μ1 - μ2 Ho: μ1 = μ2 or Ho: μ1 - μ2 = 0

Suppose we give a cancer drug to one group and not to another group, and look at the difference in proportions of people who survive for five years in these two conditions. What parameter are we trying to estimate, and what statistic do we use to estimate it?

A. The parameter is the difference between the population proportions of survivors for the two conditions. The parameter is the difference in sample proportions.

How do you find the standard deviation of the difference in sample proportions?

A. The variance of the difference is the sum of the variances of the individual proportions. So the sd of the difference is sqrt(pˆ1qˆ1/n1 + pˆ2qˆ2/n2). And because you don't know the ps and qs for this expression, you substitute the sample statistics for the population parameters (the usual ploy).

What is the formula for the confidence interval for the difference of two proportions?

A. It's the estimate +- the margin of error, or: (phat1 - phat2) ± z* (sqrt ( (phat1qhat1/n1) + (phat2qhat2/n2)))

What are the conditions for the confidence interval for a difference in proportions? Be specific.

A. SRS (the two samples are SRSs from their populations), Normality (all 4 np-hat and nq-hat quantities are 5 or more), and Independence (the two populations are at least 10 times as large as the samples).

In doing a hypothesis test for the difference of two proportions, we compute a z statistic. What, in general terms (that is without going into the specific formula), is in the numerator and the denominator of this statistic?

A. The numerator is the difference in sample proportions. The denominator is an estimate of the standard deviation of the difference of sample proportions (a.k.a. the standard error of the difference).

What does pˆc represent and in what formula is it used?

A. pˆc is the total successes over the total trials for both samples combined. It is used in the two- sample z statistic for proportions.

When working with two-sample problems for proportions, the Normality condition is checked differently depending on whether you are constructing a confidence interval or a significance test. What are the definitions of each? You may write this in symbols or words.

A. Confidence Interval: n1pˆ1 n1qˆ1 n2pˆ2 n2qˆ2 are all greater than or equal to 5 (or 10) Significance Test: n1pˆc n1qˆc n2pˆc n2qˆc are all greater than or equal to 5 (or 10)

simple sample

comparing one xbar or phat to a known value

paired data

pair to find differences, then have one xbar or phat to compare to a known value

two independent samples

two xbars or phats to compare to one another