Chapter 15: Sampling Distribution Models Flashcards
Define ‘Sampling distribution model’.
Different random samples give different values for a statistic. The sampling distribution model shows the behaviour of the satistic over all the possible samples for the same size n.
Define ‘Sampling variability (sampling error)’.
The variability we expect to see from one random sample to another. It is sometimes called sampling error, but sampling variability is the better term.
Define ‘Sampling distribution model for a proportion’.
If assumptions of independence and random sampling are met, and we expect at least 10 successes and 10 failures, then the sampling distribution is modelled by a Normal model with a mean equal to the true proportion value, p, and a standard deviation equal to
sqrt( pq/n ).
Define ‘Sampling distribution model for a mean’.
If assumptions of independence and random sampling are met, and the sample size is large enough, the sampling distribution of the sample mean is modelled by a Normal model with a mean equal to the population mean, μ, and a standard deviation equal to
σ / sqrt(n).
Define ‘Central Limit Theorem (CLT)’.
States that the sampling distribution model of the sample mean (and proportion) from a random sample is approximately Normal for large n, regardless of the distribution of the population, as long as the observations are independent.
Usually the mean of a sampling distribution is the value of…?
The parameter estimated. I.e, for the sampling distribution of p̂ the mean is p and for the sampling distribution of ȳ the mean is μ.
The sampling distribution of the mean is Normal, no matter what the underlying distribution of the data is, however…?
The CLT says that this happens in the limit, as the sample size grows. The Normal model applies sooner when sampling from a unimodal, symmetric population and more gradually when the population is very non-Normal.
Populations with a true proportion, p, close to 0 or 1 can be a problem. What happens in these cases.
When p is close to 0 the distribution is skewed to the right and when it is close to 1, it is skewed to the left.
What are the two assumptions needed to use the Normal sampling distribution model for a sample proportion?
- The Independence Assumption (Randomization Condition)
- The Sample Size Assumption (10% Condition - sample size must be no larger than 10% of population, and Success/Failure Condition - at least 10 successes and 10 failures)
The Success/Failure Condition wants sufficient data. How much depends on p. Explain.
If p is near 0.5, we need a sample of only 20 or so. If p is only 0.01, however, we’d need a sample of 1000. How about if p was 0.99?
What are the two assumptions needed to use the Normal sampling distribution model for a sample mean(CLT)?
- The Independence Assumption (Randomization Condition)
- The Sample Size Assumption (Large Enough Sample Condition - does not say, likely dependent mostly on data distribution, i.e. skewed vs. normal)
When we have categorical data, do we utilize sample mean or proportion? For quantitative data?
Categorical = Proportion Quantitative = Mean
Sampling distributions arise because …?
Samples vary.
The CLT quantifies…?
Sampling error.
The denominator in the variability of sample means shows that it decreases as the sample size increases. What’s the catch?
The standard deviation decreases of the sampling distribution declines only with the square root of the sample size and not, for example, with 1/n. This limits how much we can make a sample tell about the population.