lecture 8 - surveys and samplin Flashcards

1
Q

e.g. literary digest + gallup poll

A

both election polls

literary digest = sampling bias (coverage error): selected from people with car membership and newspaper subscription + there was high non-response

gallup poll = smaller quota sample, was more accurate (in later elections it was less accurate)

!the size of the sample does not matter that much: it must be representable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

survey research: key challenges

A

measurement

  • measurement validity
  • measurement error/reliability = do people understand the question + do they understand it in the same way?

representation

  • coverage (error) = sampling frame needs to cover the whole population (this is what literary digest did wrong)
  • sampling error = how we select our sample from the sample frame
  • non-response error = non-response can create bias if non-response is not random (e.g. people who are not interested in topic A drop out -> not representative)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 types of surveys?????????????

A

cross-sectional = snapshots of one moment of time

  • measures everything at once: DV, IV -> hard to establish causality (you can’t determine what is the cause and what is the effect)

longitudinal

  • cohort study = pooled cross-sectional time series (use same questionnaire at different times)
  • panel study = cross-lagged causal analysis: select a sample once and stick with the same respondents, re-interview them several times
  • rolling cross-sectional = 1 big sample, spread out interviews over a long period of time (=expensive)????????????
  • trend studies: summarize cross-sectional polls and calculate some average over time

nonscientific and unethical polls (esp. in USA)
all start out as usual polls, but then

  • push polls = use to spread negative campaign info (e.g. if you heard that A did B, would you still vote for A)
  • '’sugging’’ = selling under the guise of research
  • '’frugging’’ = fund-raising under the guise of research

-> less responses to actual surveys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

notation for surveys and experiments

A

Y = dependent variable
X = independent variable
- multiple factors: X1, X2 (but the numbers small at the bottom)

M(X) = manipulation of X

RS = random sampling
RA = random assignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

cohort studies + APC

A

each time = new random sample, questions remain the same

useful for: Age-Period-Cohort Models (APC)

  • age effects: e.g. between old and young respondents
  • period effects: e.g. critical events
  • cohort effects: e.g. birth cohorts (generatoin/decade difference)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

panel study: cross-lagged causal analysis

A

measure cause and effect at the same time at time 1 and time 2

this way you can see if IV at time one predicted DV at time 2, or if it was the other way around (DV1 predicts IV2 = falsification)

*see picture in slides + in notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

surveys: methodological issues / establishing causality

A
  • surveys don’t give concrete/absolute causality (panel can be seen as exception)
  • benefit = statistical control for alternative explanations

goal is more prediction than explanation

! sampling unit / unit of analysis -> conclusions/analysis needs to be on the individual level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

questionnaire design issues 7

A
  1. operationalization needs to be finalized in advance (questions can’t be changed)
  2. constraint: length
  3. reactivity: do surveys measure or create opinions
  4. close-ended vs open-ended questions
  5. response scales (neutral point & don’t know options are good to include)
  6. question order
  7. questoin wording
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

potential respondent issues

A
  • lack of comprehension
  • recall problems
  • misreporting/social desirability (esp. with sensitive questions)
  • acquiescence bias: agreement bias and response sets: people are morel likely to agree than disagree with something
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what to avoid with wording of surveys

A
  1. vague questions
  2. acronyms
  3. leading questions: positive or negative connotations (e.g. do you approve of the prime ministers performance depsite recent missteps?
  4. negative questions (e.g. do you agree that Turkey should not become an EU member?)
    - with these questions the no answer can be interpreted in two ways: they don’t agree that they shouldn’t become members OR no they shouldn’t become members
  5. double-barraled questoins = two questions in one (do you favor increase defense spending or do you think that current defense spending is enough?)
  6. biased/loaded questions (more assistance for people on welfare? vs more assistance for poor people?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

some solutions - survey problems

A
  • randomization of question order + wording
  • balanced questions: provide arguments on both sides so that people get context
  • pilot study / pre-testing
  • monitoring and verifying

! look at slide/notes of costs/benefits of interviewing modes (interviewer vs self-administered)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

census

A

= contacting every member of the population (countries sometimes do this, NL doesn’t: think they have a good registration)

this is the exception: typical research doesn’t include the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

sampling populations elements

A
  • finite population = specific amount, e.g. citizens
  • infinite population = no natural limit, e.g. coin toss
  • known = e.g. citizens
  • unknown = e.g. intrastate conflicts (impossible to establish a complete list of all conflicts that have happened)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the goal of sampling

A

to say something about the whole population

  • parameter = statistic for the whole population
  • statistic = based on sample, can be used as prediction of the parameter

population -> sample -> statistic -> parameter
*population …> parameter
(it’s a square)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

sampling bias - 2 forms

A

selection bias = researcher artifact

response bias = participant artifact

  • self-selection bias: individual choice to participate -> bias: people who participate are diff from those that don’t
  • non-response: uninterviewable, not found, not at home/answering or refusal
    *if non-response is random, than there is no bias

solution = probability sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

probability sampling
requirements and types 3

A
  1. every unit in population has equal probability to be chosen
  2. observer cannot predict which units are chosen other than with chance probability
  3. the sample must include any possible combination of units from the sampling frame

types

SRS: simple random sampling

  • with replacement (people can be chosen multiple times) vs without replacement (people can be chosen only once, not everyone has same probability of being chosen)
  • list of population
  • systematic random sample: sampling more convenient if you have a sample frame, list of population from which you select (start with a random selection and then e.g. take every 10th unit, works when the list has no pattern)

stratisfied (random) sampling = SRS within known subgroups

  • disproportionate sampling is possible -> re-weighting (e.g. when you want to look at specific group, but also whole population)
  • members of groups need to be represented in sample equally as in population

(multistage) cluster sampling = subdividing population in different cluster that are represented as a mini-population

  • population -> equivalent and internally heterogenous groups
  • sampling in stages
  • selection probability of clusters proportionate to size
  • e.g. all adults in household A
17
Q

non-probability sampling

A

convenience/volunteers sampling = not representative
- use easily available participants, e.g. students

purposive sampling

  • decision of researcher based on specific characteristics
  • quota sampling / cell sampling = unrepresentative by design: e.g. target 50% male and 50% female
  • snowball/referral/chain/network sampling = informants give other participants

*(theoretical sampling) = has nothing to do with sampling, has to do with data collection in grounded theory

18
Q

response rate

A

= completed interviews / selected (eligible) sample

contact rate = % of selected individuals contacted
cooperation rate = % of individuals participating
surveyed rate = % of respondents surveyed too often

recommendations to increase the response rate = pre-notification mailings + follow-up calls/mailings -> make people more willing to participate

19
Q

weighting

A

based on information available a priori (esp with stratisfied sampling)

or

post hoc corrections (unit non-response): people underrepresented per chance or because of non-response

!weighting does nothing for systematic sampling bias (if a group is not represented, it can not be reweighted)

use of weighting =

  • highly recommended for inference about a whole populatoin
  • optional for testing (patterns that support) causal relationships
20
Q

sampling error and sample size

A

SE= sampling error

  • random/non-systematic
  • decreases with sample size
  • should be reported: point estimate and range
    e.g. 50% A in survey with 1000 respondents -> certainty that the true value lies between 47%-53% (-> if there is a change of 1% in a follow-up, than it falls within this error)

sample size
depends on homogeneity and needed details

  • homogenous -> smaller sample okay
  • details -> larger sample necessary

large sample decreases sampling error + increases statistical power

*sampling bias is NOT affected by sample size, it is always unrepresentative

!!!fraction/size of a population is irrelevant for sample size!!!! (esp. irrelevant when there is a sampling bias)