L4 Statistical techniques and sampling designs Flashcards
Descriptive statistics
Methods of summarizing the data in an informative way
- central tendency: mean, median, mode
- dispersion: range, stdev, variance, interquartile range
Inferential statistics
Methods to draw conclusions (or to make inferences, test hypotheses) • Mean difference test • Chi-square test • Analysis of variance (ANOVA) • Regression analysis • Logit analysis
Four types of scales
- Nominal (qualitative)
- Ordinal (qualitative)
- Interval (quantitative)
- ratio (quantitative)
Nominal scale
allows classifying data into groups/categories
e.g. gender
Ordinal scale
rank orders in a meaningful way
e.g. education level
Interval scale
Meaningful differences between values, but no natural zero point –> zero means something (0 degrees)
Ratio scale
Meaningful differences and ratios between values due to a natural zero point –> zero is actually nothing (0 dollar is no money)
Choosing between inferential statistics:
IV=nominal/ordinal DV=nominal/ordinal
Chi-square test
Choosing between inferential statistics:
IV=nominal/ordinal DV=interval/ratio
T-test, Anova
Choosing between inferential statistics:
IV=interval/ratio DV=nominal/ordinal
logit analysis
Choosing between inferential statistics:
IV=interval/ratio DV=interval/ratio
regression analysis
When to perform T-Test vs Anova
T-Test –> compare two means (two levels of IV)
Anova –> compare more than two levels
Rating scales
- Likert scale: strongly agree/disagree
- Semantic differential: Cold warm
TREATED AS INTERVAL/RATIO so that you can use regression
What is a population?
Entire group of people, firms, events, or things of interest for which you would like to make inferences
What is a sample?
A subset of the population of interest
What is a subject?
Single member
What is low representativeness?
= properties of the population are over- or underrepresented in the sample
= high sampling error
The sampling process
- define population
- determine sampling frame
- determine sampling design
- determine sample size
- define population
e.g. students TISEM, dutch organ donors
- determine sampling frame
“Physical” representation of the target population
- where you can reach out to e.g. Donorregister
coverage error
sampling frame ≠ population
• Under-coverage: true population members are excluded
• Miss-coverage: non-population members are included
solutions to coverage error
- If small, recognize but ignore
* If large, redefine the population in terms of the sampling frame
- determine sampling design
probability vs non-probability sampling
Probability sampling
Each element of the population has a known chance
of being selected as a subject
–>Results generalizable to population
BUT more time and resource intensive
Nonprobability sampling
The elements of the population do not have a known chance of being selected as a subject
–> less time and resource intensive
BUT results not generalizable to population
Probability sampling techniques
- Simple random sampling (SRS)
- Systematic sampling
- Stratified sampling
- Cluster sampling
Simple random sampling (SRS)
Each population element has an equal chance of being chosen
e.g. out of a hat
–> Highest generalizability
BUT costly?
Systematic sampling
Select random starting point and then pick every nth element
–> simplicity
BUT low generalizability if there happens to be a systematic difference between every nth observation
Stratified sampling
Divide the population in meaningful (homogenous) groups, then apply SRS within each group
e.g. level of income
–> All groups are adequately sampled, allowing for group comparisons
BUT more time consuming and Requires homogenous subgroups
Cluster sampling
Divide the population in heterogeneous groups, randomly select a number of groups and select each member within these groups
e.g. geographic clusters (areas)
–> Geographic clusters
BUT Subsets of naturally occurring clusters are typically more homogeneous than heterogeneous
Nonprobability sampling
- Convenience sampling
- Quota sampling
- Judgment sampling
- Snowball sampling
Convenience sampling
Select subjects who are conveniently available
e.g. random on the street
–> Convenient (inexpensive and fast)
BUT lower generalizability
Quota sampling
Fix quota for each subgroup (percentage in population)
–> When minority participation is critical
BUT lower generalizability
Judgment sampling
Select subjects based on their knowledge/professional judgment
e.g. experts
–> Convenient (inexpensive and fast) when a limited # of people has the info you need
BUT Lower generalizability
Snowball sampling
“Do you know people who…”
e.g. people with rare disease
–> For rare characteristics (“experts”)
BUT first participants strongly influence the sample
Rules of thumb for sample size
• Sample size ≥ 75, < 500
• Multivariate research: ≥ 10 x parameters to be
estimated
• Subsamples (e.g., male/female): ≥ 30 per subsample