Intro to multi-level models (MLM) Flashcards
Clustering
What is clustered data?
= when our observations have some sort of grouping that is NOT something we are interested in studying
This is important as the observations in one cluster are likely to be more similar to each other than to observations in different clusters
For example:
- children within schools
- observations within individuals
This can quickly become complicated with the more levels you have
Clustering
What is the intra-class correlation coefficient (ICC)?
It is a ratio of:
variance between groups : total variance
ICC can take values between 0 and 1. The larger the ICC, the lower the variability is within the clusters compared to the variability between clusters
For example:
ICC of 0.48 means that 48% of our variance is attributable to by-cluster differences
Clustering
Why and how is clustering a problem?
WHY:
It is something systematic that our model should (arguably) take into account
HOW:
standard errors are often smaller which also means:
- CIs will be too narrow
- t-statistics will be too large
- p-values will be too small
Clustering
What is wide data?
observations are written across separate columns (not the preferred method)
Clustering
What is long data?
there is one column for all observations and the different columns tell us which cluster an observation belongs to
Clustering
Dealing with Clustering
What is complete pooling?
Ignoring clustering
= information from all clusters is pooled together to estimate x
- basically, takes everyone’s data, pools it together and then plots it as one line on a graph
This is not the best method as clusters can show different patterns which this method does not account for
- also residuals are NOT independent
Clustering
Dealing with Clustering:
What is no pooling?
Fixed effects models
= information from a cluster contributes to an estimate for that cluster (and ONLY that cluster)
- information is not pooled
This method is good as it has lots of estimates
BUT it is flawed as no pooling would include results that might be anomalous
also it has less statistical power due to having more parameters
Clustering
Dealing with Clustering:
What is partial pooling?
Random Effects models
= cluster level variance in intercepts and slope is modelled as randomly distributed around fixed parameters.
- Effects are free to vary by cluster but information from all clusters contributes to an overall fixed parameter
Rather than estimating differences for each cluster, we are estimating the variation (or spread of distributions) of intercept points
MLM
What is multilevel regression?
used for observation j in group i
used for data structures when we’ve observed things that happen at multiple levels
e.g. children in classes in schools etc
MLM
Multiple regression equation
It looks similar to simple regression but we now need a 2 level equation
Level 1:
yij = β0i + β1ixij + ε
Level 2:
β0i = γ00 + ζ0i
β1i = γ10 + ζ1i
Where:
- γ00 is the population intercept and ζ0i is the deviation of group i from γ00
- γ10 is the population slope and ζ1i is the deviation of group i from γ10
Basically
γ means there is a fixed number for the whole population
ζ accounts for the deviation of each individual group (random effects)
MLM
Assumptions of multiple regression:
we now assume ζ0, ζ1 and ε to be normally distributed with a mean of 0
MLM
What are fixed effects?
Items that do not vary by cluster are fixed effects
e.g. γ00 or γ10
If we repeated the experiment we would use the same levels
Desired inference:
the conclusions refer to the levels used
MLM
What are random effects?
common definition = “we allow (?) to vary by (?)”
ζ is considered random as it is considered a random sample from a larger population
If we repeated the experiment different levels would be used
desired inference:
the conclusions refer to a population from which the levels used are just a (random) sample
This whole thing is random effects in R:
…. ( random intercept + random slope | grouping structure)
MLM
Radom intercept vs Random slope (in R)
Random intercept:
lmer(y ~ 1 + x + (1|g), data = df)
Random intercept and slope:
lmer(y ~ 1 + x + (1 + x |g), data = df)
MLM
Advantages of MLM
MLM can be used to answer mutli-level questions, for example:
- Do phenomena at level X predict outcomes at level Y?
e.g “does left vs right handedness predict variation in reaction times?” - Do phenomena at level X influence effects at level Y?
e.g. “does being mono vs bilingual influence grades over the duration of schooling?” - Do random variances covary?
e.g. “do people who have higher cognitive scores at the start of the study show less decline over the duration of the study than those who started with lower scores?”
MLM
lmer output:
Fixed effects = one fixed model line
Random effects = random (individual) deviations around the fixed line (assumed to be normal)
Residual = captures the final step from individual line to individual observations
MLM
ICC in lmer
Obtained by fitting an intercept only model in lmer as ICC is conditional on random intercepts (so the inclusion of random slopes would influence )
From the random effects in your model summary:
ICC = intercept / ( intercept + slope or residual value )
MLM
what is marginal R squared?
= variance explained due to fixed effects
MLM
what is conditional R squared?
= variance explained due to fixed and random effects
MLM
Model estimation
MLM are too complicated to calculate using a closed form solution so instead we estimate all the parameters using an iterative procedure of maximum likelihood
MLM
Maximum Likelihood estimation (MLE)
Aim = find the values for the unknown parameters that maximise the probability of obtaining the observed data
How = done by finding values that maximise the log-likelihood function
treats fixed effects as KNOWN when estimating the variance components at each iteration
- this can lead to biased estimates of variance components
MLM
Restricted maximum likelihood estimation (REML)
estimates the variance components first and then separates the estimation of fixed and random effects
- this leads to less bias estimates of the variance components
better for small sample sizes
MLM
What are convergence warnings?
They come into play when model when the optimiser we are using either can’t find a suitable maximum, or gets stuck in a singularity (think of it like a black hole of likelihood, which signifies that there is not enough variation in our data to construct such a complex model)
MLM Inference
p-values in lmer?
In simple lm we test the reduction in SSresidual which follows an F-distribution with known df.
only in very specific conditions in lmer will we have known df.
Parameter estimate in MLM are MLE/REML estimates which means it is:
- unclear how to calculate denominator degrees of freedom (DDF)
- also unclear as to whether t-statistics would even follow an f-distribution
We need other options for inference
MLM Inference
Options for inference:
Approximating DDF
- Kenward-rogers
Kenward rogers:
- models must be fitted with REML
- adjusts SEs to avoid small sample bias
- approximated denominator df (may not be whole number)
MLM Inference
Options for inference:
Likelihood based methods
- profile likelihood confidence interval
Models need to be fitted with MLE
Evaluates the curvature of the likelihood surface at the estimate
- sharp curve = more certainty in estimate
- gradual curve = less certainty
MLM Inference
Options for inference:
Likelihood based methods
- likelihood ratio tests
Models need to be fitted with MLE
- not good for small sample sizes
Uses anova()
Compares loglikelihood of two competing models
- ratio of two likelihoods is asymptotically (as n increases towards infinity) chi squared distributed
MLM Inference
Options for inference:
Bootsrap
Parametric bootstrap:
- confidence interval
- likelihood ratio test
Case based bootstrap
- confidence interval
MLM inference
MLE vs REML
Fit with ML if:
- models differ in fixed effects only
- models differ in BOTH fixed and random effects
- you want to use anova()
Fit with REML is:
- models differ in random effects only