Endogeneity and Multilevel Analysis Flashcards
Endogeneity
Endogeneity refers to a situation where an explanatory variable is correlated with the error term in a regression model. This correlation can lead to biased and inconsistent estimates of the coefficients, making it difficult to draw valid inferences about causal relationships.
Endogeneity can arise from several sources, including omitted variable , measurement error, simultaneity, reverse causality, common-method bias and single-source bias.
Omitted variables in endogeneity
Uncontrolled confounding variable that is correlated with both the IV and with the error term in the regression formula / the DV
Simultaneity in endogeneity
This occurs when the dependent variable also influences the independent variable. In such cases, it is unclear which variable is the cause, and which is the effect.
Reverse causality in endogeneity
The dependent variable effects the independent variable
Measurement error in endogeneity
When the independent variable is measured with error, it can lead to a correlation between the IV and the error term. This is particularly problematic when the measurement error is systematic.
Example: If income is self-reported and individuals underreport their income, this measurement error can bias the estimated effect of income on consumption.
Common-method bias in endogeneity
Variance is attributable to the measurement method (e.g., survey with 5-point Likert scales) rather than to the constructs the
measures are assumed to represent
Single-source bias in endogeneity
Variance is attributable to the IV and DV being collected from a single source (e.g., self-rated motivation and performance).
Confounding variables
A confounding variable is an extraneous variable that is related to both the independent variable and the dependent variable, potentially leading to a spurious association between them. It can create a false impression of a relationship or mask a true relationship.
Example: Age confounding exercise and weight loss.
Multilevel analysis
Multilevel analysis, also known as hierarchical linear modelling or mixed-effect modelling, is used when the variable of interest has explanatory factors on more than one level.
This type of analysis is particularly useful when dealing with nested data structures—such as students within classrooms, patients within hospitals, or repeated measures within individuals—where observations are not independent
In multilevel models, data is structured in levels:
Level 1: Individual-level variables (e.g., student characteristics).
Level 2: Group-level variables (e.g., classroom characteristics).
Requirements for multilevel analysis
- Sufficient observations (>15) on the higher level (“level 2”) of independent
- variables.
- The dependent variable resides on the lower level (“level 1”). [exceptions
- possible!
- Nested data structure
Intraclass correlation coefficient (ICC)
This statistic indicates the proportion of total variance that is attributable to the group level. An ICC close to 1 suggests that most variance is between groups, while an ICC close to 0 suggests that most variance is within groups.
Interpretation: If ICC = 0.25, it means that 25% of the total variance in the outcome is due to differences between groups. Typical cutoff value at ICC>0.1
Random intercept / fixed slope models
This is what we normally assume.
These models allow the intercept to vary across groups but assume that the effect of predictors is consistent across groups.
Interpretation: The average outcome can differ across groups, but the relationship between predictors and outcome is the same.
Random slope / fixed intercept models
These models allow both the intercept and the slope of a predictor to vary across groups.
Interpretation: The relationship between a predictor and the outcome can differ across groups, reflecting more complex interactions.