Sampling Distributions Flashcards
Sampling distribution
A sampling distribution is a distribution of all of the possible values of a statistic for a given size sample selected from a population.
Sampling Distribution of the Mean
Sampling Distribution of the Proportion
Standard error of the mean
Different samples of the same size from the same population will yield different sample means.
A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean. Note that the standard error of the mean decreases as the sample size increases.
(This assumes that sampling is done with replacement, or that sampling is done without replacement, from a large or infinite population).
If the population is not normal
We can apply the Central Limit Theorem, which states that regardless of the shape of individual values in the population distribution, as long as the sample size is large enough (generally n ≥ 30) the sampling distribution of XBAR will be approximately normally distributed with:
Characteristics pf the sampling distribution of the proportion
π is the proportion of items in the population with a characteristic of interest.
p is the sample proportion and provides an estimate of π
Sampling Distribution of the Proportion
Selecting all possible samples of a certain size, the distribution of all possible sample proportions is the sampling distribution of the proportion.
Standard Error of the Proportion
The underlying distribution of the sample proportion is binomial.
It can be approximated by normal distribution if ≥ 5 and ≥ 5 with the resulting mean equal to and standard error equal to:
Reasons for taking a sample
Less time-consuming than a census. Less costly to administer than a census. Less cumbersome and more practical to administer than a census of the targeted population.
2 types of samples used
Non-probability sample:
Items included are chosen without regard to their probability of occurrence.
Probability sample:
Items in the sample are chosen on the basis of known probabilities.
Simple random sampling
Every individual or item from the frame (N) has an equal chance of being selected (1/N).
Selection may be with replacement or without replacement.
Samples can be obtained from a table of random numbers or computer random number generators.
Simple to use but may not be a good representation of the population’s underlying characteristics.
Systematic sampling
Divide frame of N individuals into n groups of k individuals: k = N/n.
Randomly select one individual from the 1st group.
Select every kth individual thereafter.
Like simple random sampling, simple to use but may not be a good representation of the population’s underlying characteristics.
Stratified sampling
Divide population into two or more subgroups (called strata) according to some common characteristic.
A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes – called proportionate stratified sampling.
Samples from subgroups are combined into one.
Stratified sampling pros
More efficient than simple random sampling or systematic sampling because of assured representation of items across entire population.
Homogeneity of items within each stratum provides greater precision in the estimates of underlying population parameters.
Cluster samples
Population is divided into several ‘clusters’, each representative of the population e.g. postcode areas, electorates etc.
A simple random sample of clusters is selected:
All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique.
Cluster sampling pros
More cost effective than random sampling, especially if population is geographically widespread.
Often requires a larger sample size compared to simple random sampling or stratified sampling for same level of precision.
Evaluating survey worthiness
What is the purpose of the survey?
Is the survey based on a probability or non-probability sample?
Survey errors
Coverage error – appropriate or adequate frame?
Non-response error – results in non-response bias.
Measurement error – ambiguous wording, halo effect or respondent error.
Sampling error – always exists and is the difference between sample statistic and population parameter.