Lecture 4 - Experimental vs Observational research Flashcards
Causal Inference
Researchers are often interested in explanatory (causal) questions.
Does social media exposure increase affective polarization?
Does economic wealth shape people’s policy preferences, e.g., on wealth or inheritance tax?
Does democracy spur/hinder economic development?
Does X cause/explain Y
A high level of education increases the probability that an individual participates in elections
Probalisitic or Deterministic
Prob
Highly educated individuals always participate in elections
Probabilistic or Deterministic
Determ
What is the problem with causal inference
- Tricky
- Cannot directly observe causal effects
- Cannot observe counterfactuals
Causal inference = inferring something we do not know (causal effects) from something we do know (data)
Counfounder
- Correlation does not equal causation
- There may be a correlation between X and Y, However how do we know that Z hasnt caused both of these.
Three requirements for establishing causality
- Association between X and Y
- All confounders ruled out
- Reverse causality ruled out
Does economic development cause democracy OR does democracy cause economic development?
Reverse Causality
A research design in which the researcher both controls and randomly assigns values of the independent variable to participants
Randomized Experiments
- Researcher assigns differents treatments to participants, they are randomaly assigned.
- Gold standard for causal relationships
A/B experiment
Involves a treatment group which receives the treatment we want to investigate and a control group which does not receive the treatment (or a placebo).
Randomized control trial
Randomized Experiments: Why Do They Work So Well?
Random assignment ensures that the treatment and the control groups are comparable based on pre-treatment characteristics
This includes any known confounder… and even unknown confounders!
The treatment precedes the outcome, thus also ruling out reverse causality
Internal Validity
The degree to which we can be confident that a study identifies the causal effect of the independent on the dependent variable
External Validity
The degree to which findings can be generalized to other contexts
Ecological validity
Behaviour observed in artificial experimental settings may not generalize to the real world
Population Validity
Experiments often involve unrepresentative subject pools (e.g., UG students) and it can therefore be questionable whether experimental findings generalize from the study sample to the population of interest
Reactivity
People may change their behaviour when they know they are being observed
Labortary Experiments
- High level of control on what subjects are exposed to
- Concerns on population, ecological and reactivity validity
- High internal validity
Field Experiments
- Higher ecological validity, lower reactivity.
- Higher population validity
- Researches have less control over application.
Survey experiments
- Highest population vailidity, diverse or representative samples
Why not always experiments
- Sometimes it is hard for political scientists to manipluate variables.
- External validity concerns
A research design in which the researcher does not have control over values of the independent variable
Observational Research Design
- Good for description, questions regarding distributions, questions regarding charaterisitcs and meaning.
- Used for explanation
The values of the independent variable arise naturally in such a way that we can speak of true or, more realistically, “as if” random assignment
Natural experiments
- For causal questions only
- High internal and external validity
A good example is Joshua Angrist’s study of the effects of military conscription on earnings. Angrist leveraged the fact that there was a lottery in 1970-1972 to draft soldiers to the Vietnam War. Angrist compared earnings of those who were drafted with those who were not. Also illustrates that natural experiments are rarely perfect. Among thorny issues range that there were ways to evade the lottery, such as getting a medical prescription. Esp. young men from rich backgrounds (e.g., Donald Trump) were successful in this, which leads to bias.
A good example of what?
Natural Experiments
Different data structures - Cross sectional studies
Examine a cross-section of social reality, focusing on variation between individual units, such as citizens, countries, etc.
Typical example is an election study. Or a snapshot of democracy levels across the world at a given point in time. An explanatory study would leverage cross-sectional variation in the independent variable between units.
Different data structures - Time series
Examine evolutions of a single unit over time. A typical example would be studies looking at trends in economic performance in a single country.
Different data structures - Repeated cross-sections
Cross-sectional studies which are repeated at different points in time. For example, election studies are repeated after every election. Note that most likely the election study will involve different people; however, at a higher level of aggregation (the country as a whole), the study is repeated.
Different data structures - Panels
Same people are observed over time. For example, there are surveys which interview the same people multiple times.
Different data structures - TSCS
These studies are basically like panels, but when it is countries or regions within a country that are observed multiple points over time, some people use the term “time-series cross-sectional studies”.
Types of data - Large C
- Data expressed as number
- Quantative
- Number & Statistics
- High standardization
Types of data - Scientific Realism SMALL C
- Data expressed as Number and meaning
- Quantative and Qual
- Numbers, stats, processes, words and symbols
- High or average standardization
Types of data - interpretivist SMALL C
- Data expressed as Meaning
- Qualitative
- Words and symbols
- Low
Measurement Levels - relevant for scientific Realism - NOMINAL
- Data classified into categories without a natural order
- Type of political system (democracy, authoritarianism), party affiliation (Conservative, Labour, Lib Dem, Green, etc.)
Measurement Levels - relevant for scientific Realism - Ordinal
- Data arranged in a meaningful order, but intervals between rankings may not be equal.
- Education, level of interest in politics (not interested, somewhat interested, very interested).
Measurement Levels - relevant for scientific Realism - Interval
- Numeric scales with equal intervals between values but no true zero point.
- Ideology constructed as: “Place yourself on a 0-10 Liberal-Conservative scale, where 0 is very liberal and 10 is very conservative.”
Measurement Levels - relevant for scientific Realism - Ratio
- Numeric scales with equal intervals and a meaningful zero point.
- Government spending in GBP, number of political protests attended in a year (zero indicates none attended).
Face Validity
Is there broad agreement that the indicator is directly relevant to the concept?
Survey question “What is your age?” measuring concept age.
Content Validity
Do the indicators cover the full range of the concept?
Democracy - free and fair elections, protected civil liberties, the rule of law, checks and balances, etc.
Construct Validity
Convergent validity
- Is the indicator similar to other indicators it theoretically should be similar to?
- Does signing petitions correlate with contacting politicians? Both should indicate concept political participation.
Discriminant validity
- Is the indicator different from other indicators it theoretically should be different from?
- Signing petition not necessarily correlated with watching TV.
Criterion validity
Predictive validity
- Does the indicator predict an outcome based on some criterion?
- Attitude towards a politician in a survey – can it predict voting for the politician
Concurrent validity
Does my measure with another established measure of the same concept?
Party ideology based on expert survey responses correlates with another measure of polarization.
Primary Data
- The researcher carries out the data collection (collected by you).
- Full control over the collection process
- More expensive
- More difficult
Secondary Data
Others have collected, the researcher just analysing it (someone else collected)
- No control over the collection process
- Less expensive and faster
- Need to check quality
- Qualitative case-studies