Intro to multi-level models (MLM) Flashcards
Clustering
What is clustered data?
= when our observations have some sort of grouping that is NOT something we are interested in studying
This is important as the observations in one cluster are likely to be more similar to each other than to observations in different clusters
For example:
- children within schools
- observations within individuals
This can quickly become complicated with the more levels you have
Clustering
What is the intra-class correlation coefficient (ICC)?
It is a ratio of:
variance between groups : total variance
ICC can take values between 0 and 1. The larger the ICC, the lower the variability is within the clusters compared to the variability between clusters
For example:
ICC of 0.48 means that 48% of our variance is attributable to by-cluster differences
Clustering
Why and how is clustering a problem?
WHY:
It is something systematic that our model should (arguably) take into account
HOW:
standard errors are often smaller which also means:
- CIs will be too narrow
- t-statistics will be too large
- p-values will be too small
Clustering
What is wide data?
observations are written across separate columns (not the preferred method)
Clustering
What is long data?
there is one column for all observations and the different columns tell us which cluster an observation belongs to
Clustering
Dealing with Clustering
What is complete pooling?
Ignoring clustering
= information from all clusters is pooled together to estimate x
- basically, takes everyone’s data, pools it together and then plots it as one line on a graph
This is not the best method as clusters can show different patterns which this method does not account for
- also residuals are NOT independent
Clustering
Dealing with Clustering:
What is no pooling?
Fixed effects models
= information from a cluster contributes to an estimate for that cluster (and ONLY that cluster)
- information is not pooled
This method is good as it has lots of estimates
BUT it is flawed as no pooling would include results that might be anomalous
also it has less statistical power due to having more parameters
Clustering
Dealing with Clustering:
What is partial pooling?
Random Effects models
= cluster level variance in intercepts and slope is modelled as randomly distributed around fixed parameters.
- Effects are free to vary by cluster but information from all clusters contributes to an overall fixed parameter
Rather than estimating differences for each cluster, we are estimating the variation (or spread of distributions) of intercept points
MLM
What is multilevel regression?
used for observation j in group i
used for data structures when we’ve observed things that happen at multiple levels
e.g. children in classes in schools etc
MLM
Multiple regression equation
It looks similar to simple regression but we now need a 2 level equation
Level 1:
yij = β0i + β1ixij + ε
Level 2:
β0i = γ00 + ζ0i
β1i = γ10 + ζ1i
Where:
- γ00 is the population intercept and ζ0i is the deviation of group i from γ00
- γ10 is the population slope and ζ1i is the deviation of group i from γ10
Basically
γ means there is a fixed number for the whole population
ζ accounts for the deviation of each individual group (random effects)
MLM
Assumptions of multiple regression:
we now assume ζ0, ζ1 and ε to be normally distributed with a mean of 0
MLM
What are fixed effects?
Items that do not vary by cluster are fixed effects
e.g. γ00 or γ10
If we repeated the experiment we would use the same levels
Desired inference:
the conclusions refer to the levels used
MLM
What are random effects?
common definition = “we allow (?) to vary by (?)”
ζ is considered random as it is considered a random sample from a larger population
If we repeated the experiment different levels would be used
desired inference:
the conclusions refer to a population from which the levels used are just a (random) sample
This whole thing is random effects in R:
…. ( random intercept + random slope | grouping structure)
MLM
Radom intercept vs Random slope (in R)
Random intercept:
lmer(y ~ 1 + x + (1|g), data = df)
Random intercept and slope:
lmer(y ~ 1 + x + (1 + x |g), data = df)
MLM
Advantages of MLM
MLM can be used to answer mutli-level questions, for example:
- Do phenomena at level X predict outcomes at level Y?
e.g “does left vs right handedness predict variation in reaction times?” - Do phenomena at level X influence effects at level Y?
e.g. “does being mono vs bilingual influence grades over the duration of schooling?” - Do random variances covary?
e.g. “do people who have higher cognitive scores at the start of the study show less decline over the duration of the study than those who started with lower scores?”
MLM
lmer output:
Fixed effects = one fixed model line
Random effects = random (individual) deviations around the fixed line (assumed to be normal)
Residual = captures the final step from individual line to individual observations
MLM
ICC in lmer
Obtained by fitting an intercept only model in lmer as ICC is conditional on random intercepts (so the inclusion of random slopes would influence )
From the random effects in your model summary:
ICC = intercept / ( intercept + slope or residual value )
MLM
what is marginal R squared?
= variance explained due to fixed effects
MLM
what is conditional R squared?
= variance explained due to fixed and random effects
MLM
Model estimation
MLM are too complicated to calculate using a closed form solution so instead we estimate all the parameters using an iterative procedure of maximum likelihood
MLM
Maximum Likelihood estimation (MLE)
Aim = find the values for the unknown parameters that maximise the probability of obtaining the observed data
How = done by finding values that maximise the log-likelihood function
treats fixed effects as KNOWN when estimating the variance components at each iteration
- this can lead to biased estimates of variance components
MLM
Restricted maximum likelihood estimation (REML)
estimates the variance components first and then separates the estimation of fixed and random effects
- this leads to less bias estimates of the variance components
better for small sample sizes
MLM
What are convergence warnings?
They come into play when model when the optimiser we are using either can’t find a suitable maximum, or gets stuck in a singularity (think of it like a black hole of likelihood, which signifies that there is not enough variation in our data to construct such a complex model)
MLM Inference
p-values in lmer?
In simple lm we test the reduction in SSresidual which follows an F-distribution with known df.
only in very specific conditions in lmer will we have known df.
Parameter estimate in MLM are MLE/REML estimates which means it is:
- unclear how to calculate denominator degrees of freedom (DDF)
- also unclear as to whether t-statistics would even follow an f-distribution
We need other options for inference
MLM Inference
Options for inference:
Approximating DDF
- Kenward-rogers
Kenward rogers:
- models must be fitted with REML
- adjusts SEs to avoid small sample bias
- approximated denominator df (may not be whole number)
MLM Inference
Options for inference:
Likelihood based methods
- profile likelihood confidence interval
Models need to be fitted with MLE
Evaluates the curvature of the likelihood surface at the estimate
- sharp curve = more certainty in estimate
- gradual curve = less certainty
MLM Inference
Options for inference:
Likelihood based methods
- likelihood ratio tests
Models need to be fitted with MLE
- not good for small sample sizes
Uses anova()
Compares loglikelihood of two competing models
- ratio of two likelihoods is asymptotically (as n increases towards infinity) chi squared distributed
MLM Inference
Options for inference:
Bootsrap
Parametric bootstrap:
- confidence interval
- likelihood ratio test
Case based bootstrap
- confidence interval
MLM inference
MLE vs REML
Fit with ML if:
- models differ in fixed effects only
- models differ in BOTH fixed and random effects
- you want to use anova()
Fit with REML is:
- models differ in random effects only
Assumptions and Diagnostics
Assumptions in lm
The general assumptions in lm are
“mean of 0 and constant variance”
Remember: LINE?
Assumptions and Diagnostics
Assumptions in MLM
Similar to lm - the general idea is the same of:
“error is random”
but now we have residuals at multiple levels!
- we have our overall (fixed effects) line
- then random effects lines of how much the group line is different from the overall line slope
- then around the random effects lines we have the residuals of individual points
Assumptions and Diagnostics
Assumption plots:
plotting residual vs fitted values
plot ( model , type = c ( “p” , “smooth” ) )
Assumptions and Diagnostics
Assumption plots:
QQplots
Used to check normality
- we want the dots to follow the line
qqnorm(resid(model))
qqline(resid(model))
Assumptions and Diagnostics
Assumption plots:
Scale-location plots
Measure of spread
- we just want the line to be horizontal (it doesn’t matter where it sits)
plot(model,
form = sqrt(abs(resid(.))) ~ fitted(.),
type = c(“p”,”smooth”))
Assumptions and Diagnostics
Assumption plots:
Plotting by cluster
Used just to look for systematic patters
Assumptions and Diagnostics
Assumption plots:
Quick assumption check
performance :: check_model(model)
this gives us an overview but isn’t used in formal write-ups
Assumptions and Diagnostics
Troubleshooting:
Model mis-specification?
if assumptions look violated, check the model is correct
e.g.
- are the interaction terms needed
- does that variable vary by cluster
Assumptions and Diagnostics
Troubleshooting:
Transformations?
(not massively recommended)
Transforming your outcome variable may help satisfy model assumptions
- but this may come at the expense of interpretability
there are many methods
e.g. BoxCox = finds the ‘best’ transformation
- after this we can only refer to y as BoxCox transformed y and we have no way of knowing if our transformed y is meaningful
Assumptions and Diagnostics
Troubleshooting:
Bootstrap?
Same basic principles as in lm
if we are concerned our errors are non-normal or heteroskedastic and we have a LARGE sample size, it might be a good option
BUT if there are effects with mis-specification (e.g. the effect is non-linear) bootstrapping won’t help
Assumptions and Diagnostics
Troubleshooting:
Types of Bootstrapping
Parametric bootstrap
= resample based on the estimated distribution of parameters
assumes explanatory variables are fixed and that the model specification and distributions are correct
- not very helpful in assumption violations
Assumptions and Diagnostics
Troubleshooting:
Types of Bootstrapping
Resample Residuals
y* = y hat + ε hat
- sampled with replacement
assumes explanatory variables are fixed and that the model specification and distributions are correct
- not very helpful in assumption violations
Assumptions and Diagnostics
Troubleshooting:
Types of Bootstrapping
Case based bootstrapping
= resample cases
- minimal assumptions other than that we have specified the hierarchical structure of our data
BUT this presents us with the issue of do we sample individual observations? do we sample clusters? or both?
For example, in R to bootstrap participants (clusters) but not their observations we include
resample = c (TRUE,FALSE)
Assumptions and Diagnostics
Influence
What are influential cases?
high leverage cases = are able to direct our model line in a certain way
high outlier = points that fall far from our model line and other observed data points
high influence = high leverage + high outlier
- the case is far from our other observed data points and pulls the model line in a direction that misrepresents the rest of the observed data
- cook’s distance
Both observations (level 1) and clusters (level 2) can be influential
Assumptions and Diagnostics
Influence
Level 1: Influential points
plot QQplot of model
Diagnostics package
library(HMLdiag)
infl1 <- hml_influence(model, level = 1)
- dot plot (of cook’s distance) = points beyond the red line can be considered influential
dotplot_diag(infl1$cooksd, cutoff = “internal”)
Assumptions and Diagnostics
Influence
Level 2: Influential clusters
If we have multiple observations for each participant, each participant can be considered a cluster
So to determine an influential cluster:
infl2 <- hml_influence(model, level = “ppt”)
dotplot_diag(infl2$cooksd, cutoff = “internal”, index = infl2$ppt)
This will provide a graph of clusters (participants) scaled on influence
Assumptions and Diagnostics
Influence
Sensitivity analysis
Would our whole conclusion change if we excluded an influential case?
We make a model with and without the influential case and compare them
- if your conclusion does not change, don’t mention it
- if your conclusion changes, mention it in the discussion and look closer at the influential case to determine why it is so influential
Centring Predictors in MLM
What is centring?
We can recentre our data so that any value forms the new 0 point
common versions of this are:
- mean centring
- centring so our data starts at 0 (so our model is not trying to predict the 0 point)
Centring Predictors in MLM
What is scaling?
Suppose we have a variable for which the mean is 0 and the sd id 15 - we can change the scale of our data so 1 unit change in x is equivalent to 1 unit change in sd
Centring Predictors in MLM
Group mean centring
in MLM we have multiple means to work with:
Grand mean = mean of all observations (regardless of cluster)
Group means = mean of each cluster
Group mean centring = take each individual observation in a group and minus the group mean
Centring Predictors in MLM
Within effects
When looking at the graph, you are looking at the differences between individual observations that contribute to one line
For example, in a study of how anxiety effects drinking (for clusters) of participants
“is being more nervous (than you usually are) associated with higher alcohol comsumption”
To answer this, we would group mean centre anxiety to plot the clusters (participants) against their average anxiety levels
Centring Predictors in MLM
Between effects
When looking at the graph, you look at the differences between the cluster lines
For example, in a study of how anxiety effects drinking (for clusters) of participants
“is being generally more nervous (than other participants) associated with higher alcohol consumption
To answer this, we plot group means and then compare them
Centring Predictors in MLM
When do we need to consider within and between effects?
- when we have a predictor that varies within a cluster
- when we have different average levels of x (typically occurs in observations rather than experiments)
- when our RSQ concerns x
Random Effect Structures
What is a nested structure?
Things in that structure belong only to that structure
e.g. children will belong to one class, in one school, in one district etc
Random Effect Structures
Nested random effects structures:
Imagine the children in classes in schools example
We can write their random effects structures in R a few ways:
(1| school ) + ( 1 | class )
( 1 | school ) + ( 1 | class : school)
( 1 | school / class ) = “group by school and within that group by class”
- This can only be used if group labels are unique
- for example, if the children in each class are labelled 1-30 we wouldn’t know who was who if we used this method
Random Effect Structures
What is a crossed structure?
basically the opposite of nested structures
the things in one cluster can also be in other clusters
e.g. we have multiple participants complete 5 tasks multiple times
- observations here can be clustered by participant and by task
Random Effect Structures
Crossed random effects structures:
Imagine the multiple participants complete 5 tasks multiple times example
We write the crossed random effects structure in R as follows:
… + ( 1 | ppt ) + ( 1 | task )
Model Building
Maximal structures
What is a maximal model?
maximal model = the most complex structure that you can fit to the data
- everything that can be modelled as a random effect is done so
- everything in the fixed effects is ‘up for grabs’ for the random effects
- requires sufficient variance (which if often not the case)
in R:
we use isSingular(maxmodel) to see if the model has converged to a sensible answer
- if the output is TRUE there is an issue and we need to simplify our maximal model
Model Building
Maximal structures
What do you do if a model won’t converge?
Don’t report results from a model that won’t converge! you can’t trust its estimates
Instead:
- check the model specification = do your random effects make sense?
- try a different optimizer
- adjust the max iterations
- stop the tolerances
In most cases our model is too complex and we just have to simplify it
Model Building
Maximal structures
What is an optimizer?
= the method by which the computer finds the best fitting model for our MLM
we can try all optimizers at once in R using:
summary(allFit(model))
this may help us choose an optimizer that allows our model to converge
throughout DAPR3 they’ve used ‘bobyqa’ like literally all the time
Model Building
Maximal structures
Deciding on random effect structures
Selection based
Use a criterion for model selection (e.g. LRT, AIC, BIC etc.) to choose a random effect structure that is supported by the data
- these are parsimony corrected so have more power
Model Building
Maximal structures
Deciding on random effect structures
Keep it maximal
start with the maximal model
remove random effects with the least variance until the model converges
this means we’re trying to fit the most complicated model that we can given the variance in our data (risks overfitting)
Model Building
Maximal strucutres
Simplification
- extract the random effects
in R = VarCorr(maxmodel)
Look for: - small variances / sd
- correlations of 1 or -1
- Consider removing the more complex random effects first (e.g. interaction term)
- categorical predictors with 2+ levels are ‘more complex’ (as they require more parameters)
- remove higher level random effects if needed (if you have multiple levels of nesting, you’ll have fewer groups as you go up the levels)
- subjective choice = which simplification can you most easily accept
Model Building
Random effects correlation
Removing random effects correlations simplifies the model
Correlations between our random effects can alter our model results
we can remove the correlations but still see the random effects by using || in our model instead of | for example:
… + ( 1 + x || y )
RSQs with MLM
Model specification
LMER ( outcome ~ fixed effects + ( random effects | grouping structure ) , data = data)
LMER = how is it measured
before fitting a model think: do we need to centre or scale
RSQs with MLM
Model specification
lmer ( OUTCOME ~ FIXED EFFECTS + ( random effects | grouping structure ) , data = data)
OUTCOME
what are we interested in explaining and predicting
FIXED EFFECTS
- what variables are we interested in explaining by this
- are our questions about the effect if our predictors specifically in reference to group means of predictors? etc.
RSQs with MLM
Model specification
lmer ( outcome ~ fixed effects + ( RANDOM EFFECTS | grouping structure ) , data = data)
RANDOM EFFECTS
which of our fixed effects can vary for our random groups?
- does a single group have multiple distinct values of x?
- what can we imagine a slope for?
RSQs with MLM
Model specification
lmer ( outcome ~ fixed effects + ( random effects | GROUPING STRUCTURE ) , data = data)
GROUPING STRUCTURE
in what different ways can we group our data?
- of the ways we can group our data, which are of specific inferential interest?
- of the ways we can group our data, which groupings do we think of as a random sample of a general population?
- are these groupings nested?
- are the labels unique? etc.
RSQs with MLM
Model fitting
model issues = check for convergence and singular fit (adjust accordingly)
model assumptions
- can use check_model to check everything looks right
- normality of residuals = QQplot
- influence = cook’s distance / dot plot
plots are preferable as statistical tests can be overly sensitive
RSQs with MLM
Interpretation and Inference
Fixed effects
use: fixef(model) to get fixed effect values
Interpretation = using plot_model to plot your fixed effect may help you make sense of the fixef(model) output
Inference
Tests
- model comparison
- parameter estimates
Methods
- df approximations (e.g kenward-rogers)
- Likelihood ratio tests
- Bootstrap
We can use any of these to make inferences about our fixed effects
RSQs with MLM
Interpretations and Inference
Random effects
we use random effects to add context to our results
For example, adding random intercepts and slopes to a graph can help provide context of the actual trends in the data - helping us answer our RSQ
RSQs with MLM
Reporting
The Analytical Process
1) data cleaning/removal/transformations are performed prior to analysis
2) unplanned transformations or removals performed in order to meet assumptions
3) specify all fixed effects (explanatory variables and covariates) linking to the RSQ/hypothesis
4) plan a structure of random effects to be fitted and the procedure used to decide final random effects structure if the model does not converge
5) state clearly relevant test / comparison and link this to RSQ / hypothesis
RSQs with MLM
Reporting
Results
1) software packages used to fit models
2) estimation method
3) optimizer used
4) if the proposed model failed to converge, the steps used to reach final model
For final model
5) all parameter estimates for fixed effects (e.g. coefficients, SEs, CIs, t-stats/df/p-values if used)
Random effects
6) variance/sd for each random effect, residual variance, correlations/covariates if modelled
RSQs with MLM
Reporting
tests and tables
Tables help a lot but results given in the table must be included within the written interpretations to provide the reader with the context of each number’s meaning