Week 10: Survey Analysis Flashcards
What is the difference between a census and a sample?
Census: Information gathered on every member of the population
Sample: Information gathered on a population subset to represent the whole and make inferences
Why use a sample instead of a census?
- cost-effective
- time-saving
- reduces workload
- useful when the population is large and difficult to survey entirely
What is a sampling frame?
A list of all possible sampling units from which the sample is drawn. Ideally, the frame should match the target population/universe
Example sampling frames: Registered voters; UK residents
Example universes: Voters; UK population
What are the characteristics of an ideal sampling frame?
- adequate (covers all population units)
- accurate (correct listing of all sampling units and organised logically with numerical identifiers)
- complete (no omissions, duplications or extraneous units)
- up-to-date
What are the main types of sampling methods?
Probability sampling: Each unit has a known chance of selection
Non-probability sampling: Not all units have a known or equal chance of being selected
What is the difference between probability and non-probability sampling?
Probability sampling: Reduces selection bias and allows for error measurement
Non-probability sampling: Easier and cheaper but introduces selection bias
What are common non-probability sampling methods?
- convenience/haphazard sampling (units selected arbitrarily and cannot estimate representativeness)
- purposive (judgement) sampling (units are selected subjectively to obtain a sample that appears to represent the population)
- volunteer sampling (respondents are volunteers who are screened to get a set of characteristics for the purposes of the survey - carries large selection biases; e.g., individuals with a particular disease)
- quota sampling
- expert sampling
- snowball sampling
List types of probability sampling methods
- simple random sampling (SRS)
- systematic sampling
- stratified sampling
- cluster sampling
- multi-stage sampling
Explain stratified sampling
The population is divided into strata (homogeneous, mutually exclusive groups), and independent samples are drawn from each stratum. This ensures better representation of subgroups, allowing meaningful subgroup inferences
- assumes that groups are more homogenous within-group than across the population. This is more efficient; you need smaller samples from each stratum to get precise estimates for that stratum
What is cluster sampling and how does it differ from stratified sampling?
Cluster sampling: Population is divided into clusters, and entire clusters are sampled
Stratified sampling: Samples are drawn from each stratum, not the whole cluster
What is multi-stage sampling?
A combination of cluster and stratified sampling, where sampling occurs within selected clusters in multiple stages. This sampling occurs in two stages:
- clusters here are referred to primary sampling units (PSU) and units within clusters as secondary sampling units (SSU)
- sample size needed to obtain given level of precision would still be bigger than for an SRS (less efficient method)
What factors determine sample size?
- acceptable significance level
- study power
- expected effect size
- population variability (standard deviation)
also study structure (descriptive/comparative); study design; resources and finances; non-response
What are common errors associated with sampling?
- coverage/sampling frame error (sample used does not properly represent underlying population under study)
- sampling error (unavoidable - degree to which a statistic differs from its “true” value given that the survey was conducted among only one of many possible samples)
- non-response error (selected sampling units are not interviewed)
Why is it important to consider sampling frame carefully?
A non-representative sampling frame can lead to sampling frame error (biased sample) which can cause inaccurate predictions
What are the advantages of non-probability sampling?
- quick and convenient
- cost-effective
- minimal respondent burden
What are the disadvantages of non-probability sampling?
- selection bias (assumptions to make inferences about the population are too strong)
- non-coverage bias
- difficult to assess the quality of estimates
What is the response rate and how is it calculated?
Response rate = (no. of respondents) / (total eligible sample)
- denominator includes non-responders: refusals, language problems, illness, not available
- ineligible people should be removed from sampling frame (and not included in response rate)
- can affect study’s representativeness