17. Understanding Linear Models Flashcards
What is causality?
One event directly leads to another event
(Different from covariance where two variables change together)
What are the different conditions for causality?
Covariation
- When two factors occur at the same time but are not related
- E.g. Ice Cream and Shark Attacks
Plausibility
- Is the causation actually plausible to occur?
Temporal precedence
- A happens before B but B does not lead to A etc.
No reasonable alternatives
- Hard to establish
- Fails to account for alternative explanations - may lead to spurious correlations
How can causality be tested?
Identifying causal relationships = Examined through study design rather than statistical testing
e.g. test experimental vs observational design (manipulating one variable and seeing it’s effect on the other)
- Needs good causal relation test in the first place as many studies are poorly designed
OR
Propensity score matching = Instrumental variable analysis (use stats to simulate control group)
What is a marginal distribution?
An event’s value, independent of other events
What is a conditional distribution?
An events value, given the value of another event
What is endogeneity?
Theoretically occurs in a marginal distribution of predictor variable and is not independent of conditional distribution of outcome variable, given the predictor variable
Occurs when predictor variable x is correlated with error term - causes bias in beta estimates
e not equal to 0
What is an endogenous variable?
An endogenous variable is any variable in the regression model that is correlated with the error term.
Variable measure is determined by the model
What is a exogenous variable?
An exogenous variable is an explanatory variable that is not correlated with the error term
Variable measure is determined outside of the model not by the model
What are the problems with endogeneity?
- Can’t easily test whether variables are endogenous
- Model estimate of error will be biased by endogenous variable if we have a model with both endogenous and exogenous variable
- Even if you detect endogeneity must still determine why it’s there to solve the issue
What are the different sources of endogeneity? (Name only)
Simultaneity bias
Omitted/Confounding variables
Measurement Error
What is simultaneity bias?
X causes Y, which causes x
E.g. Farmer’s income <-> crop yield
(y = Beta 0 + Beta 1 (exogenous) + Beta 2 (endogenous)
If endogeneity is due to simultaneity (done at the same time as something else) then x (exogenous) will lead to change in y that will change x (endogenous) as it is linked to the DV/model
More endogenous variables = Effect is more pronounced
How do we solve simultaneity bias?
Use statistical methods developed for this situation (two-stage least squares regression)
How do omitted/confounding variables explain endogeneity?
In a perfectly exogenous model - effect of x on y is separated from the error term
When x is correlated with both the outcome and an omitted variable z, the variance explained by z falls on ϵ
What is the solution for endogeneity when omitted/confounding variables cause it?
Ensure confounds are measured and included in the model, no small tasks, requires thorough knowledge of the topic
How does measurement error cause endogeneity?
Instead of measuring x, you measure x∗, which is a measurement of x with error (r) included
E.g. Reporting errors and coding errors
Similar to the case of omitted variables , measurement error becomes part of error but will be associated with x, leading to endogeneity