Intro into Multi-level data Flashcards
What is the difference between level-1 & level-2 variables?
level-1 variables vary at level 1 (i.e. different SES levels of students)
level-2 variables cannot vary at level 1 (i.e. every student in school has the same student to teacher ratio, same school type, size, etc.)
How are mutli-level and panel datasets similarly structured?
- PISA, SOEP, etc. (i.e. students nested in schools)
- longitudinal data in general (timepoints nested in individuals)
Why do we need special methods for multi-level data?
1) Correct statistical inference
“Dependence as nuisance“ (Snijders & Bosker): Basic assumptions of regression/inferential statistics are violated –> no
- Independent observations
- Independent error terms
- Homoscedastic errors
- Normal distribution of errors
Examples
- Exam scores are more similar within classes
- Political attitudes cluster in regions
- Measurements of body weight are correlated over time
ml/longitudinal data highly correlated
which means we cannot evaluate statistical uncertainty appropriately as our standard errors are getting too small (the larger the sample the smaller the standard error but our sample is artificially inflated with not independent observations aka denominator is largely than it is supposed to be) –> make SE smaller
Why do we need special methods for multi-level data?
2) Substantial questions
- Dependencies/correlations within clusters as a subject matter (e.g., how much of variance in grades can we attribute to differences between schools?)
Why do we need special methods for multi-level data?
3) Dealing with unobserved heterogeneity
Due to hierarchical data structure we have unobserved heterogeneity at different levels which can affect the relationships between variables at those levels.
By capturing and addressing this heterogeneity - through a RE - researchers gain a deeper understanding of how group-level factors influence individual outcomes + more accurate estimates
Possible level-2 variables influencing math performance of students?
How can the meaning of a level-1 variable change when its aggregated?
Male -> Math performance
- male level-1: pressure from parents, teacher, socialization → pos influence on math performance
- male level-2 (% of boys in class): more disturbance in class → neg influence on math performance
What could be level-1 and level-2 confounders of motivation -> math performance?
How can we split variance?
How do we model the mean?
What is in the residuals/error term?
Model for the mean
unobserved heterogeneity/omitted variable, in here are all the factors that influence the outcome but are not in the model
also, those factors are assumed to be random
What is special about the residuals?
Model for the mean
How do we need to modify the model for the mean for multi-level data?
Split the variance -> 2 error terms:
Result: Unobserved heterogeneity on both levels
* Unobserved Level-1 factors
* Unobserved Level-2 factors
Both assumed to be random
What is the variance partition coefficient + how do u calculate it?
Variance Partition Coefficient (VPC), also: as the Intraclass Correlation Coefficient (ICC)
Statistical measure to quantify the proportion of variance in a dependent variable that is attributable to different levels of the data hierarchy
Empty model mscore: Stata code