lecture 5 & 6 Flashcards
causation
Assessing the impact of a factor (X) on an outcome (Y)
two types of causation
Two Types:
1. Deterministic relationship: every time x occurs, y occurs
2. Probabilistic relationship: if x, then y is more likely
problem of causal inference
What would have happened without the intervention
We would have to observe the same entity getting treatment and not getting it
But → the same entity cannot both get treatment and not get treatment
five main hurdles to causation
- A credible causal mechanism
- Why independent variables cause dependent variables. A plausibility test. - reverse causality (endogeneity)
- Does x cause y, or does y cause x? Sometimes it’s easy to rule out:
* Gender → political views
Often hard:
* Economic development → democracy - do x and y co-vary
- They should covary, one variable moves in one direction and the other moves in the same/opposite direction
But they do not always. To fix this you could add a third variable and plot the results, and then should see a relationship between the variables - confounding variables
- There could be a third variable Z (or set of variables) (confounding variable) that affects both the dependent variable (Y) and independent variable (X).
If we do not account for this confounding, we risk drawing biased inferences from our study.
We would falsely determine that there is a causal relationship between X and Y, when in fact this relationship is spurious. - selection bias
- The way we select our observations is crucial. Ideally, they should be selected randomly.
controlling for confounding variables
- Design of the research: engineer randomness (e.g. experiment and quasi-experiments)
* Intervention/Treatment
* Experimental or treatment group: receives the treatment
* Control group: does not receive the treatment - Statistical control: include the confounding variables in the model.
measures of central tendency
Median → value that splits the distribution in half
Mean → average written as x̅
Mode → value with the greatest frequency
Choosing measures of central tendency:
▶ Nominal data: Mode (mean and median CANNOT be used)
▶ Ordinal data: Mode, median (mean if categories as numbers)
▶ Quantity data: Mean or median
Median is more representative of the data
Mean is more sensitive to extreme values
measures of dispersion
Dispersion → how the data is distributed, narrow or wide
Narrow (values are all close to one another)
Wide (values are spread out)
quantifying the spread
the range (difference between the largest and smallest values)
Limitations: does not explain whether data is narrow or wide
other name for the mean
average of deviation
the mean of squared deviations
variance σ2
standard deviation
Standard deviation is a statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance.
σ = √ σ 2 (and s = √ s 2 when we talk about the sample standard deviation).
causal inference
the process of determining whether an observed association truly reflects a cause-and-effect relationship.