Chapter 12: Sample Surveys Flashcards
What are the 3 ideas of sampling?
- Examine a part of the whole: A sample can give information about the population.
- Randomize to make the sample representative.
- The sample size is what matters. It’s the size of the sample - and not its fraction of the larger population - that determines the precision of the statistics it yields.
What are some sampling methods?
- Simple random sample (SRS)
- Stratified samples
- Cluster samples
- Systematic samples
- Multistage samples
What are some causes of bias?
- Voluntary response
- Convenience samples
- Bad sampling frames
- Undercover age
- Nonresponse bias
- Response bias
Define ‘Population’.
The entire group of individuals or instances about whom we hope to learn.
Define ‘Sample’.
A (representative) subset of a population, examines in hope of learning about the population.
Define ‘Sample survey’.
A study that asks questions of a sample drawn from some population in the hope of learning something about the entire population. Polls taken to assess voter preferences are common sample surveys.
Define ‘Bias’.
Any systematic failure of a sampling method to represent its population. It is almost impossible to recover from bias, so efforts to avoid it are well spent. Common errors include relying on voluntary response, undercoverage of the population, nonresponse bias and response bias.
Define ‘Randomization’.
The best defense against bias is randomization, in which each individual is given a fair, random chance of selection.
Define ‘Sample size’.
The number of individuals in a sample. The sample size determines how well the sample represents the population, not the fraction of the population sampled.
Define ‘Census’.
A sample that consists of the entire population.
Define ‘Population parameter’.
A numerically values attribute of a model for a population. We rarely expect to know the true value of a population parameter, but we do hope to estimate it from sampled data. For example, the mean income of all employed people in the country is a population parameter.
Define ‘Statistic, sample statistic’.
Values calculated for samples data. Those that correspond to, and thus estimate, a population parameter are of particular interest. For example, the mean income of all employed people in a representative sample can provide a good estimate of the corresponding population parameter.
Define ‘Representative’.
A sample is said to be representative if the statistics computed from it accurately reflect the corresponding population parameters.
Define ‘Simple random sample (SRS)’.
A simple random sample of sample size n is a sample in which each set of n elements in the population has an equal chance of selection.
Define ‘Sampling frame’.
A list of individuals from who the sample is drawn. Individuals who may be in the population of interest, but who are not in the sampling frame, cannot be included in any sample.
Define ‘Sampling variability’.
The natural tendency of randomly drawn samples to differ, one from another. Sometimes - unfortunately - called sampling error, sampling variability is no error at all, but just the natural result of random sampling.
Define ‘Stratified random sample’.
A sampling design in which the population is divided into several subpopulations, or strata, and random samples are then drawn from each stratum. If the strata are homogenous, but are different from each other, a stratified sample may yield more consistent results.
Define ‘Cluster sample’.
A sampling design in which entire groups, or clusters, are chosen at random. Cluster sampling is usually selected as a matter of convenience, practicality or cost.
Define ‘Multistage sample’.
A sampling scheme that involves multiple stages of random sampling, where at each successive stage, we sample from lists of ever smaller clusters (hierarchal in nature).
Define ‘Systematic sample’.
A sample drawn by selecting individuals systematically from a sampling frame. When there is no relationship between the order of the sampling frame and the variables of interest, a systematic sample can be representative.
Define ‘Pilot’.
A small trial run of a survey to check whether questions are clear. A pilot study can reduce errors due to ambiguous questions.
Define ‘Voluntary response bias’.
Bias introduced to a sample when individuals can choose on their own whether to participate in the sample. Samples based on voluntary response are always invalid and cannot be recovered, no matter how large the sample size.
Define ‘Convenience sample’.
A sample of individuals who are conveniently available. Convenience samples often fail to be representative because every individual in the population is not equally convenient to sample.
Define ‘Undercoverage’.
A sampling scheme that biases the sample in a way that gives a part of the population less representation than it has in the population suffers from undercoverage.
Define ‘Nonresponse bias’.
Bias introduced when a large fraction of those samples fails to respond. Those who do respond are likely to not represent the entire population. Voluntary response bias is a form of nonresponse bias. but nonresponse bias may occur for other reasons. For example, those who work during the day won’t respond to a telephone survey during working hours.
Define ‘Response bias’.
Anything in a survey design that influences responses. One typical response bias arises from the wording of questions, which may suggest a favoured response.
Explain the difference between a population, a sampling frame, and a sample
Pop.-Entire group of indv.
Sampling Frame-List of indv. from whom sample is drawn
Sample-Represents a pop.
What does it mean for a sample to be representative of a population
Small sample basically covers what entire population thinks or does
What is meant by a biased sample
Fails to represent its population accuracy
What is the role of randomization in selecting a sample
protects influences of all features of a population
What is meant by a census? Why is a census often impractical?
Census- special sample, everyone included, responses from entire pop.
-impractical because pop. is constantly changing
Explain the difference between a parameter and statistic
A parameter is something we hope to estimate from data
A Simple Random Sample (SRS) must satisfy what two conditions?
Every subject/unit/etc. must have an equal chance for being selected and each combo of subject/unit/etc. must have equal chance of being selected.
What is meant by sampling variability
differences between each randomly chosen sample
When is stratified random sampling useful
When two or more diff. groups may bias your results. Split them and analyze separately
When is cluster sampling useful
When sample size is too large
What is meant by a multistage sampling
combining several sampling methods together
When is systematic sampling appropriate
When there is no relationship between order of sampling frame and variables of interest
In what way are voluntary response samples often biased
Usually biased towards those with strong opinions or strongly motivated
Why is convenience sampling unreliable
Only including individuals convenient to you isnot necessarily representative of population
What is meant by under coverage? Give an example
Proportion of pop. not sample at all or has small representation in sample than it has in pop.
ex: telephone survey and you eat out, less likely to answer telephone and be surveyed
Explain the difference between non-response bias and response bias
non response bias-lack of response bias results, impossible what non respondents have said
response bias-refers to anything in survey design that influences responses
parameter
numbers in model that have to be chosen to explicitly determine value of model
statistic
any summary found from the data
response bias
Preconceived notions of a person answering [a survey] which may alter the experiments purpose. One typical example of this arises from the wording of questions, which may suggest a favored response. Voters, for example, are more likely to express support of “the president” than support of the particular person holding that office at the moment
non response bias
bias introduced when a large fraction of those sampled fails to respond. Those who do respond are likely to not represent the entire population.Voluntary response bias is one form of this, but can occur for other reasons. For example, those who are at work during the day won’t respond to a telephone survey conducted only during working hours