Chapter 19: Comparing Two Proportions Flashcards
Define ‘Variances of independent random variables add.”
The variance of a sum or difference of independent random variables is the sum of the variances of those variables.
Define ‘Sampling distribution of the difference between two proportions’.
The sampling distribution p̂ 1 - p̂ 2 is, under appropriate assumptions, modelled by a Normal model with mean μ = p1 - p2 and standard deviation SD (p̂ 1 - p̂ 2) = sqrt( (p1 * q1 /n1) + (p2 * q2 /n2) ) .
Define ‘Two-proportion z-interval’.
Gives a confidence interval for the true difference in proportions, p1 - p2, in two independent groups.
The confidence interval is (p̂ 1 - p̂ 2) ± z* x SE(p̂ 1 - p̂ 2), where z* is a critical value from the standard Normal model corresponding to the specified confidence level.
Define ‘Pooling’.
Data from two populations may sometimes be combined, or pooled, to estimate a parameter when the parameter is assumed to be the same in both populations. If our null hypothesis states that two proportions are equal, we pool the data to provide an estimate of the common proportion, and then use that pooled value in SE calculations to make them more precise.
Define ‘Two-proportion z-test’.
Test the null hypothesis H0: p1 - p2 = 0 by referring the statistic
z = p̂ 1 - p̂ 2 / SEpooled (p̂ 1 - p̂ 2)
to a standard Normal model.
When constructing and interpreting confidence intervals for the difference between proportions of two independent groups, what are the assumptions and conditions to check before makin inferences?
- Independence Assumption (Randomization Condition, 10% Condition, and Independent Groups Assumption)
- Sample Size Condition (Success/Failure Condition (>10))
Why do we pool the counts to find a pooled standard error estimate of the standard deviation when performing and interpreting a two-sample z-test?
Because we are hypothesizing that the proportions are equal.
Are the methods discussed in this chapter appropriate for paired or matched data?
No.
For past chapters: Highlighted and bookmarked section on large sample size for proportions near 0 or 1, SE overestimating and subsequent conservative results, maybe more for problems with results (over or underestimate) when conditions are not satisfied…
Also, stuff on power vs. precision.