Intro to multi-level models (MLM) Flashcards

1
Q

Clustering

What is clustered data?

A

= when our observations have some sort of grouping that is NOT something we are interested in studying

This is important as the observations in one cluster are likely to be more similar to each other than to observations in different clusters

For example:
- children within schools
- observations within individuals

This can quickly become complicated with the more levels you have

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Clustering

What is the intra-class correlation coefficient (ICC)?

A

It is a ratio of:
variance between groups : total variance

ICC can take values between 0 and 1. The larger the ICC, the lower the variability is within the clusters compared to the variability between clusters

For example:
ICC of 0.48 means that 48% of our variance is attributable to by-cluster differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Clustering

Why and how is clustering a problem?

A

WHY:
It is something systematic that our model should (arguably) take into account

HOW:
standard errors are often smaller which also means:
- CIs will be too narrow
- t-statistics will be too large
- p-values will be too small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Clustering

What is wide data?

A

observations are written across separate columns (not the preferred method)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Clustering

What is long data?

A

there is one column for all observations and the different columns tell us which cluster an observation belongs to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Clustering

Dealing with Clustering
What is complete pooling?

A

Ignoring clustering
= information from all clusters is pooled together to estimate x

  • basically, takes everyone’s data, pools it together and then plots it as one line on a graph

This is not the best method as clusters can show different patterns which this method does not account for
- also residuals are NOT independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Clustering

Dealing with Clustering:
What is no pooling?

A

Fixed effects models
= information from a cluster contributes to an estimate for that cluster (and ONLY that cluster)

  • information is not pooled

This method is good as it has lots of estimates
BUT it is flawed as no pooling would include results that might be anomalous
also it has less statistical power due to having more parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Clustering

Dealing with Clustering:
What is partial pooling?

A

Random Effects models
= cluster level variance in intercepts and slope is modelled as randomly distributed around fixed parameters.
- Effects are free to vary by cluster but information from all clusters contributes to an overall fixed parameter

Rather than estimating differences for each cluster, we are estimating the variation (or spread of distributions) of intercept points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

MLM

What is multilevel regression?

A

used for observation j in group i

used for data structures when we’ve observed things that happen at multiple levels
e.g. children in classes in schools etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

MLM

Multiple regression equation

A

It looks similar to simple regression but we now need a 2 level equation

Level 1:
yij = β0i + β1ixij + ε

Level 2:
β0i = γ00 + ζ0i
β1i = γ10 + ζ1i

Where:
- γ00 is the population intercept and ζ0i is the deviation of group i from γ00
- γ10 is the population slope and ζ1i is the deviation of group i from γ10

Basically
γ means there is a fixed number for the whole population
ζ accounts for the deviation of each individual group (random effects)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

MLM

Assumptions of multiple regression:

A

we now assume ζ0, ζ1 and ε to be normally distributed with a mean of 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MLM

What are fixed effects?

A

Items that do not vary by cluster are fixed effects
e.g. γ00 or γ10

If we repeated the experiment we would use the same levels
Desired inference:
the conclusions refer to the levels used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

MLM

What are random effects?

A

common definition = “we allow (?) to vary by (?)”

ζ is considered random as it is considered a random sample from a larger population

If we repeated the experiment different levels would be used
desired inference:
the conclusions refer to a population from which the levels used are just a (random) sample

This whole thing is random effects in R:
…. ( random intercept + random slope | grouping structure)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

MLM

Radom intercept vs Random slope (in R)

A

Random intercept:
lmer(y ~ 1 + x + (1|g), data = df)

Random intercept and slope:
lmer(y ~ 1 + x + (1 + x |g), data = df)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

MLM

Advantages of MLM

A

MLM can be used to answer mutli-level questions, for example:

  1. Do phenomena at level X predict outcomes at level Y?
    e.g “does left vs right handedness predict variation in reaction times?”
  2. Do phenomena at level X influence effects at level Y?
    e.g. “does being mono vs bilingual influence grades over the duration of schooling?”
  3. Do random variances covary?
    e.g. “do people who have higher cognitive scores at the start of the study show less decline over the duration of the study than those who started with lower scores?”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MLM

lmer output:

A

Fixed effects = one fixed model line

Random effects = random (individual) deviations around the fixed line (assumed to be normal)

Residual = captures the final step from individual line to individual observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

MLM

ICC in lmer

A

Obtained by fitting an intercept only model in lmer as ICC is conditional on random intercepts (so the inclusion of random slopes would influence )

From the random effects in your model summary:
ICC = intercept / ( intercept + slope or residual value )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

MLM

what is marginal R squared?

A

= variance explained due to fixed effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

MLM

what is conditional R squared?

A

= variance explained due to fixed and random effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

MLM

Model estimation

A

MLM are too complicated to calculate using a closed form solution so instead we estimate all the parameters using an iterative procedure of maximum likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

MLM

Maximum Likelihood estimation (MLE)

A

Aim = find the values for the unknown parameters that maximise the probability of obtaining the observed data

How = done by finding values that maximise the log-likelihood function

treats fixed effects as KNOWN when estimating the variance components at each iteration
- this can lead to biased estimates of variance components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

MLM

Restricted maximum likelihood estimation (REML)

A

estimates the variance components first and then separates the estimation of fixed and random effects

  • this leads to less bias estimates of the variance components
    better for small sample sizes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

MLM

What are convergence warnings?

A

They come into play when model when the optimiser we are using either can’t find a suitable maximum, or gets stuck in a singularity (think of it like a black hole of likelihood, which signifies that there is not enough variation in our data to construct such a complex model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

MLM Inference

p-values in lmer?

A

In simple lm we test the reduction in SSresidual which follows an F-distribution with known df.

only in very specific conditions in lmer will we have known df.
Parameter estimate in MLM are MLE/REML estimates which means it is:
- unclear how to calculate denominator degrees of freedom (DDF)
- also unclear as to whether t-statistics would even follow an f-distribution

We need other options for inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

MLM Inference

Options for inference:

Approximating DDF
- Kenward-rogers

A

Kenward rogers:
- models must be fitted with REML
- adjusts SEs to avoid small sample bias
- approximated denominator df (may not be whole number)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

MLM Inference

Options for inference:

Likelihood based methods
- profile likelihood confidence interval

A

Models need to be fitted with MLE

Evaluates the curvature of the likelihood surface at the estimate
- sharp curve = more certainty in estimate
- gradual curve = less certainty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

MLM Inference

Options for inference:

Likelihood based methods
- likelihood ratio tests

A

Models need to be fitted with MLE
- not good for small sample sizes

Uses anova()
Compares loglikelihood of two competing models
- ratio of two likelihoods is asymptotically (as n increases towards infinity) chi squared distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

MLM Inference

Options for inference:

Bootsrap

A

Parametric bootstrap:
- confidence interval
- likelihood ratio test

Case based bootstrap
- confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

MLM inference

MLE vs REML

A

Fit with ML if:
- models differ in fixed effects only
- models differ in BOTH fixed and random effects
- you want to use anova()

Fit with REML is:
- models differ in random effects only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Assumptions and Diagnostics

Assumptions in lm

A

The general assumptions in lm are
“mean of 0 and constant variance”

Remember: LINE?

31
Q

Assumptions and Diagnostics

Assumptions in MLM

A

Similar to lm - the general idea is the same of:
“error is random”

but now we have residuals at multiple levels!
- we have our overall (fixed effects) line
- then random effects lines of how much the group line is different from the overall line slope
- then around the random effects lines we have the residuals of individual points

32
Q

Assumptions and Diagnostics

Assumption plots:
plotting residual vs fitted values

A

plot ( model , type = c ( “p” , “smooth” ) )

33
Q

Assumptions and Diagnostics

Assumption plots:
QQplots

A

Used to check normality
- we want the dots to follow the line

qqnorm(resid(model))
qqline(resid(model))

34
Q

Assumptions and Diagnostics

Assumption plots:
Scale-location plots

A

Measure of spread
- we just want the line to be horizontal (it doesn’t matter where it sits)

plot(model,
form = sqrt(abs(resid(.))) ~ fitted(.),
type = c(“p”,”smooth”))

35
Q

Assumptions and Diagnostics

Assumption plots:
Plotting by cluster

A

Used just to look for systematic patters

36
Q

Assumptions and Diagnostics

Assumption plots:
Quick assumption check

A

performance :: check_model(model)

this gives us an overview but isn’t used in formal write-ups

37
Q

Assumptions and Diagnostics

Troubleshooting:
Model mis-specification?

A

if assumptions look violated, check the model is correct
e.g.
- are the interaction terms needed
- does that variable vary by cluster

38
Q

Assumptions and Diagnostics

Troubleshooting:
Transformations?

A

(not massively recommended)

Transforming your outcome variable may help satisfy model assumptions
- but this may come at the expense of interpretability

there are many methods
e.g. BoxCox = finds the ‘best’ transformation
- after this we can only refer to y as BoxCox transformed y and we have no way of knowing if our transformed y is meaningful

39
Q

Assumptions and Diagnostics

Troubleshooting:
Bootstrap?

A

Same basic principles as in lm

if we are concerned our errors are non-normal or heteroskedastic and we have a LARGE sample size, it might be a good option
BUT if there are effects with mis-specification (e.g. the effect is non-linear) bootstrapping won’t help

40
Q

Assumptions and Diagnostics

Troubleshooting:
Types of Bootstrapping
Parametric bootstrap

A

= resample based on the estimated distribution of parameters

assumes explanatory variables are fixed and that the model specification and distributions are correct
- not very helpful in assumption violations

41
Q

Assumptions and Diagnostics

Troubleshooting:
Types of Bootstrapping
Resample Residuals

A

y* = y hat + ε hat
- sampled with replacement

assumes explanatory variables are fixed and that the model specification and distributions are correct
- not very helpful in assumption violations

42
Q

Assumptions and Diagnostics

Troubleshooting:
Types of Bootstrapping
Case based bootstrapping

A

= resample cases
- minimal assumptions other than that we have specified the hierarchical structure of our data

BUT this presents us with the issue of do we sample individual observations? do we sample clusters? or both?

For example, in R to bootstrap participants (clusters) but not their observations we include

resample = c (TRUE,FALSE)

43
Q

Assumptions and Diagnostics

Influence
What are influential cases?

A

high leverage cases = are able to direct our model line in a certain way

high outlier = points that fall far from our model line and other observed data points

high influence = high leverage + high outlier
- the case is far from our other observed data points and pulls the model line in a direction that misrepresents the rest of the observed data
- cook’s distance

Both observations (level 1) and clusters (level 2) can be influential

44
Q

Assumptions and Diagnostics

Influence
Level 1: Influential points

A

plot QQplot of model

Diagnostics package
library(HMLdiag)
infl1 <- hml_influence(model, level = 1)

  • dot plot (of cook’s distance) = points beyond the red line can be considered influential
    dotplot_diag(infl1$cooksd, cutoff = “internal”)
45
Q

Assumptions and Diagnostics

Influence
Level 2: Influential clusters

A

If we have multiple observations for each participant, each participant can be considered a cluster
So to determine an influential cluster:

infl2 <- hml_influence(model, level = “ppt”)
dotplot_diag(infl2$cooksd, cutoff = “internal”, index = infl2$ppt)

This will provide a graph of clusters (participants) scaled on influence

46
Q

Assumptions and Diagnostics

Influence
Sensitivity analysis

A

Would our whole conclusion change if we excluded an influential case?

We make a model with and without the influential case and compare them
- if your conclusion does not change, don’t mention it
- if your conclusion changes, mention it in the discussion and look closer at the influential case to determine why it is so influential

47
Q

Centring Predictors in MLM

What is centring?

A

We can recentre our data so that any value forms the new 0 point

common versions of this are:
- mean centring
- centring so our data starts at 0 (so our model is not trying to predict the 0 point)

48
Q

Centring Predictors in MLM

What is scaling?

A

Suppose we have a variable for which the mean is 0 and the sd id 15 - we can change the scale of our data so 1 unit change in x is equivalent to 1 unit change in sd

49
Q

Centring Predictors in MLM

Group mean centring

A

in MLM we have multiple means to work with:

Grand mean = mean of all observations (regardless of cluster)

Group means = mean of each cluster

Group mean centring = take each individual observation in a group and minus the group mean

50
Q

Centring Predictors in MLM

Within effects

A

When looking at the graph, you are looking at the differences between individual observations that contribute to one line

For example, in a study of how anxiety effects drinking (for clusters) of participants
“is being more nervous (than you usually are) associated with higher alcohol comsumption”

To answer this, we would group mean centre anxiety to plot the clusters (participants) against their average anxiety levels

51
Q

Centring Predictors in MLM

Between effects

A

When looking at the graph, you look at the differences between the cluster lines

For example, in a study of how anxiety effects drinking (for clusters) of participants
“is being generally more nervous (than other participants) associated with higher alcohol consumption

To answer this, we plot group means and then compare them

52
Q

Centring Predictors in MLM

When do we need to consider within and between effects?

A
  • when we have a predictor that varies within a cluster
  • when we have different average levels of x (typically occurs in observations rather than experiments)
  • when our RSQ concerns x
53
Q

Random Effect Structures

What is a nested structure?

A

Things in that structure belong only to that structure

e.g. children will belong to one class, in one school, in one district etc

54
Q

Random Effect Structures

Nested random effects structures:

A

Imagine the children in classes in schools example
We can write their random effects structures in R a few ways:

(1| school ) + ( 1 | class )
( 1 | school ) + ( 1 | class : school)

( 1 | school / class ) = “group by school and within that group by class”
- This can only be used if group labels are unique
- for example, if the children in each class are labelled 1-30 we wouldn’t know who was who if we used this method

55
Q

Random Effect Structures

What is a crossed structure?

A

basically the opposite of nested structures
the things in one cluster can also be in other clusters

e.g. we have multiple participants complete 5 tasks multiple times
- observations here can be clustered by participant and by task

56
Q

Random Effect Structures

Crossed random effects structures:

A

Imagine the multiple participants complete 5 tasks multiple times example

We write the crossed random effects structure in R as follows:
… + ( 1 | ppt ) + ( 1 | task )

57
Q

Model Building

Maximal structures
What is a maximal model?

A

maximal model = the most complex structure that you can fit to the data

  • everything that can be modelled as a random effect is done so
  • everything in the fixed effects is ‘up for grabs’ for the random effects
  • requires sufficient variance (which if often not the case)

in R:
we use isSingular(maxmodel) to see if the model has converged to a sensible answer
- if the output is TRUE there is an issue and we need to simplify our maximal model

58
Q

Model Building

Maximal structures
What do you do if a model won’t converge?

A

Don’t report results from a model that won’t converge! you can’t trust its estimates

Instead:
- check the model specification = do your random effects make sense?
- try a different optimizer
- adjust the max iterations
- stop the tolerances
In most cases our model is too complex and we just have to simplify it

59
Q

Model Building

Maximal structures
What is an optimizer?

A

= the method by which the computer finds the best fitting model for our MLM

we can try all optimizers at once in R using:
summary(allFit(model))
this may help us choose an optimizer that allows our model to converge

throughout DAPR3 they’ve used ‘bobyqa’ like literally all the time

60
Q

Model Building

Maximal structures
Deciding on random effect structures
Selection based

A

Use a criterion for model selection (e.g. LRT, AIC, BIC etc.) to choose a random effect structure that is supported by the data
- these are parsimony corrected so have more power

61
Q

Model Building

Maximal structures
Deciding on random effect structures
Keep it maximal

A

start with the maximal model

remove random effects with the least variance until the model converges

this means we’re trying to fit the most complicated model that we can given the variance in our data (risks overfitting)

62
Q

Model Building

Maximal strucutres
Simplification

A
  • extract the random effects
    in R = VarCorr(maxmodel)
    Look for:
  • small variances / sd
  • correlations of 1 or -1
  • Consider removing the more complex random effects first (e.g. interaction term)
  • categorical predictors with 2+ levels are ‘more complex’ (as they require more parameters)
  • remove higher level random effects if needed (if you have multiple levels of nesting, you’ll have fewer groups as you go up the levels)
  • subjective choice = which simplification can you most easily accept
63
Q

Model Building

Random effects correlation

A

Removing random effects correlations simplifies the model

Correlations between our random effects can alter our model results

we can remove the correlations but still see the random effects by using || in our model instead of | for example:
… + ( 1 + x || y )

64
Q

RSQs with MLM

Model specification
LMER ( outcome ~ fixed effects + ( random effects | grouping structure ) , data = data)

A

LMER = how is it measured

before fitting a model think: do we need to centre or scale

65
Q

RSQs with MLM

Model specification
lmer ( OUTCOME ~ FIXED EFFECTS + ( random effects | grouping structure ) , data = data)

A

OUTCOME
what are we interested in explaining and predicting

FIXED EFFECTS
- what variables are we interested in explaining by this
- are our questions about the effect if our predictors specifically in reference to group means of predictors? etc.

66
Q

RSQs with MLM

Model specification
lmer ( outcome ~ fixed effects + ( RANDOM EFFECTS | grouping structure ) , data = data)

A

RANDOM EFFECTS
which of our fixed effects can vary for our random groups?

  • does a single group have multiple distinct values of x?
  • what can we imagine a slope for?
67
Q

RSQs with MLM

Model specification
lmer ( outcome ~ fixed effects + ( random effects | GROUPING STRUCTURE ) , data = data)

A

GROUPING STRUCTURE
in what different ways can we group our data?

  • of the ways we can group our data, which are of specific inferential interest?
  • of the ways we can group our data, which groupings do we think of as a random sample of a general population?
    - are these groupings nested?
    - are the labels unique? etc.
68
Q

RSQs with MLM

Model fitting

A

model issues = check for convergence and singular fit (adjust accordingly)

model assumptions
- can use check_model to check everything looks right
- normality of residuals = QQplot
- influence = cook’s distance / dot plot

plots are preferable as statistical tests can be overly sensitive

69
Q

RSQs with MLM

Interpretation and Inference
Fixed effects

A

use: fixef(model) to get fixed effect values

Interpretation = using plot_model to plot your fixed effect may help you make sense of the fixef(model) output

Inference
Tests
- model comparison
- parameter estimates
Methods
- df approximations (e.g kenward-rogers)
- Likelihood ratio tests
- Bootstrap

We can use any of these to make inferences about our fixed effects

70
Q

RSQs with MLM

Interpretations and Inference
Random effects

A

we use random effects to add context to our results
For example, adding random intercepts and slopes to a graph can help provide context of the actual trends in the data - helping us answer our RSQ

71
Q

RSQs with MLM

Reporting
The Analytical Process

A

1) data cleaning/removal/transformations are performed prior to analysis

2) unplanned transformations or removals performed in order to meet assumptions

3) specify all fixed effects (explanatory variables and covariates) linking to the RSQ/hypothesis

4) plan a structure of random effects to be fitted and the procedure used to decide final random effects structure if the model does not converge

5) state clearly relevant test / comparison and link this to RSQ / hypothesis

72
Q

RSQs with MLM

Reporting
Results

A

1) software packages used to fit models
2) estimation method
3) optimizer used

4) if the proposed model failed to converge, the steps used to reach final model

For final model
5) all parameter estimates for fixed effects (e.g. coefficients, SEs, CIs, t-stats/df/p-values if used)

Random effects
6) variance/sd for each random effect, residual variance, correlations/covariates if modelled

73
Q

RSQs with MLM

Reporting
tests and tables

A

Tables help a lot but results given in the table must be included within the written interpretations to provide the reader with the context of each number’s meaning