lecture 8 - surveys and samplin Flashcards
e.g. literary digest + gallup poll
both election polls
literary digest = sampling bias (coverage error): selected from people with car membership and newspaper subscription + there was high non-response
gallup poll = smaller quota sample, was more accurate (in later elections it was less accurate)
!the size of the sample does not matter that much: it must be representable
survey research: key challenges
measurement
- measurement validity
- measurement error/reliability = do people understand the question + do they understand it in the same way?
representation
- coverage (error) = sampling frame needs to cover the whole population (this is what literary digest did wrong)
- sampling error = how we select our sample from the sample frame
- non-response error = non-response can create bias if non-response is not random (e.g. people who are not interested in topic A drop out -> not representative)
3 types of surveys?????????????
cross-sectional = snapshots of one moment of time
- measures everything at once: DV, IV -> hard to establish causality (you can’t determine what is the cause and what is the effect)
longitudinal
- cohort study = pooled cross-sectional time series (use same questionnaire at different times)
- panel study = cross-lagged causal analysis: select a sample once and stick with the same respondents, re-interview them several times
- rolling cross-sectional = 1 big sample, spread out interviews over a long period of time (=expensive)????????????
- trend studies: summarize cross-sectional polls and calculate some average over time
nonscientific and unethical polls (esp. in USA)
all start out as usual polls, but then
- push polls = use to spread negative campaign info (e.g. if you heard that A did B, would you still vote for A)
- '’sugging’’ = selling under the guise of research
- '’frugging’’ = fund-raising under the guise of research
-> less responses to actual surveys
notation for surveys and experiments
Y = dependent variable
X = independent variable
- multiple factors: X1, X2 (but the numbers small at the bottom)
M(X) = manipulation of X
RS = random sampling
RA = random assignment
cohort studies + APC
each time = new random sample, questions remain the same
useful for: Age-Period-Cohort Models (APC)
- age effects: e.g. between old and young respondents
- period effects: e.g. critical events
- cohort effects: e.g. birth cohorts (generatoin/decade difference)
panel study: cross-lagged causal analysis
measure cause and effect at the same time at time 1 and time 2
this way you can see if IV at time one predicted DV at time 2, or if it was the other way around (DV1 predicts IV2 = falsification)
*see picture in slides + in notes
surveys: methodological issues / establishing causality
- surveys don’t give concrete/absolute causality (panel can be seen as exception)
- benefit = statistical control for alternative explanations
goal is more prediction than explanation
! sampling unit / unit of analysis -> conclusions/analysis needs to be on the individual level
questionnaire design issues 7
- operationalization needs to be finalized in advance (questions can’t be changed)
- constraint: length
- reactivity: do surveys measure or create opinions
- close-ended vs open-ended questions
- response scales (neutral point & don’t know options are good to include)
- question order
- questoin wording
potential respondent issues
- lack of comprehension
- recall problems
- misreporting/social desirability (esp. with sensitive questions)
- acquiescence bias: agreement bias and response sets: people are morel likely to agree than disagree with something
what to avoid with wording of surveys
- vague questions
- acronyms
- leading questions: positive or negative connotations (e.g. do you approve of the prime ministers performance depsite recent missteps?
- negative questions (e.g. do you agree that Turkey should not become an EU member?)
- with these questions the no answer can be interpreted in two ways: they don’t agree that they shouldn’t become members OR no they shouldn’t become members - double-barraled questoins = two questions in one (do you favor increase defense spending or do you think that current defense spending is enough?)
- biased/loaded questions (more assistance for people on welfare? vs more assistance for poor people?)
some solutions - survey problems
- randomization of question order + wording
- balanced questions: provide arguments on both sides so that people get context
- pilot study / pre-testing
- monitoring and verifying
! look at slide/notes of costs/benefits of interviewing modes (interviewer vs self-administered)
census
= contacting every member of the population (countries sometimes do this, NL doesn’t: think they have a good registration)
this is the exception: typical research doesn’t include the whole population
sampling populations elements
- finite population = specific amount, e.g. citizens
- infinite population = no natural limit, e.g. coin toss
- known = e.g. citizens
- unknown = e.g. intrastate conflicts (impossible to establish a complete list of all conflicts that have happened)
the goal of sampling
to say something about the whole population
- parameter = statistic for the whole population
- statistic = based on sample, can be used as prediction of the parameter
population -> sample -> statistic -> parameter
*population …> parameter
(it’s a square)
sampling bias - 2 forms
selection bias = researcher artifact
response bias = participant artifact
- self-selection bias: individual choice to participate -> bias: people who participate are diff from those that don’t
- non-response: uninterviewable, not found, not at home/answering or refusal
*if non-response is random, than there is no bias
solution = probability sampling