M7 - MLM Flashcards
Part B - Question:
Which of the following is a potential cluster and why?
Models of cars by Holden.
Schools that students attend.
Health ratings at Time 1 of each individual within one’s sample, health ratings at Time 2, health ratings at Time 3, etc.
Schools that students attend.
Part C - Question: Dr Pangloss has collected data on student experience of bullying at school (intended independent variable) and their level of engagement in classroom activities (intended dependent variable). Data were collected from students across 15 classes in the same school.
Which of the following statements is correct?
- Obtaining class-level averages for both variables would allow Dr Pangloss to evaluate whether average level of bullying in a classroom is predictive of average level of student engagement in a classroom..
- Dr Pangloss does not need to use multilevel modelling because all students were recruited from the same school..
- Dr Pangloss does not need to use multilevel modelling because the IV and DV are both Level 1 variables.
Obtaining class-level averages for both variables would allow Dr Pangloss to evaluate whether average level of bullying in a classroom is predictive of average level of student engagement in a classroom..
Part D - Question: The following graph depicts…
3 parallel lines going bottom left to top right diagonally. Starting at different points on the Y axis and at beginning of x axis
- Random intercept but fixed slopes.
- Random slopes but fixed intercepts.
- An instance where multilevel modelling is unnecessary.
- Random slopes and random intercepts
- Random intercept but fixed slopes.
Part G - Question: The intra-class correlation indicates…
- How much of the variance in the DV is due to within-group variance.
- How much of the variance in the DV is due to between-group variance.
- The amount of variance in the slope for the relationship between an IV and DV
- How much of the variance in the DV is due to between-group variance.
What is multilevel modelling (MLM) and when/why would it be used?
A flexible framework to model relationships between variables with clustered data. Does the relationship we are interested in work consistently across the groups
Can be used to
1) check for variance across clusters
2) correct for differences if needed
3) ponder why these differences exist - depending on our variables, we might be able to test hypotheses as to why these groups differ
Multiple IVs, 1 DV - both can be continuous or categorical, although categorical can be difficult to work with
- use with around 30 clusters or more
Clusters can vary on the DV
Clusters can vary on the IV-DV relationship
- Either sort of variance makes running standard MR problematic
Can be
- multilevel regression
- multilevel path analysis
- multilevel factor analysis
Define “clustering”.
Different groups perform differently across different groups
- also referred to as nesting or hierarchies
eg classes within schools
eg schools within state
eg cases reside over by judges
eg time point within participants
Define “intra-class correlations (ICCs)”
Intra-class correlations tests the suitability /need for a random intercept in MLM
ICC = between cluster variance / (between+within cluster variance)
= between cluster variance/ all cluster variance
Define “centering”.
Centering of L1 predictors is a way to separate within- from between cluster variance in the IV
Allows TWO questions to be asked about our data due to knowing within clusters (L1) variance and between cluster (L2 level averages) variance
group means centering approach bets for L1 predictor
No need to centre the DV as the random intercept does this for us
Centreing at L2 need to be grand mean (unless it is part of a L3 hierarchy. ie the top level needs to be grand means centred
Grand mean centering - mean score across all participants on an IV and subtracting that variable from the original score
Group mean centering - group mean score across participants in their respective groups and subtracting that variable from the original score
What are Level 1 variables? What are Level 2 variables? How do they differ?
Level 1 variable occurs at the bottom of the hierarchy
- eg student performance, student motivation
Level 2 is the variable L1 is nested in
- eg average student performance, teacher experience, class size
Level 3 is the variable L2 is nested in
- eg average school performance, school size, principal experience
Different levels will have different variables and different amounts of information. More information at lower levels
When testing performance Level 1 will have the greatest number in 1 state with 20 schools, each school has 10 classes and 25 students. Total students = 1 x 20 x 10 x 25 = 5000 Total classes = 1 x 20 x 10 = 200 Total school = 1 x 20 = 20 Total state = 1
For repeated measures design with people
Level 1 might be repeated estimates of mood per individual at multiple time points throughout day
Level 2 might be general feeling of mood on extroversion/neuroticism scale
Explain random vs. fixed effects.
Fixed effects is when a parameter (either the intercept b0 or the slope b1) is consistent across clusters.
Can also think of it as the average effect across all clusters - Assumes 1 equation is sufficient to explain the data
Fixed effect useful where the IV–>DV relationship is different across clusters but has the same magnitude (ie different intercept, same slope)
Also where the clusters do not create much of a difference at intercept or slope so you can get away with averaging
Random effect
The intercept and/or slope varies across groups. The more the clusters differ, the more likely we need to use random effects
Every model will have fixed effect. The fixed effect give us the basis to determine statistically whether we need to use a random effect.
Test need for random intercept - ICC test for cluster difference on DV
Test need for random slope - significance test of random effect for L1 relationship
Explain cross-level interactions.
Cross level interactions is when you want to incorporate the L2 predictor to explain the differences in random intercept or slope where the intercept or slope is being treated as the DV
eg random intercept
IV–>DV is L2 predictor –>b0
Y = b0
eg random slope
IV–> is L2 predictor –>b1
Y= b0 + b1*X1
How do relationships among variables differ for same or different levels?
Relationship can be L1 to L1
eg student motivation –> student test performance
L2 –> L1 eg Teacher experience –> Student test performance prediction?
Will need to look at Teacher experience –>average student test performance (between groups differences/ie class) to get comparable number of data points to make predictions
L2 and L1 –> L1
eg effect of student motivation and teacher experience on student test performance?
Two questions are asked
Level 1 is predicting the within-cluster version of the DV
-Does student motivation on average predict their performance
Level 2 is predicting the between cluster version of the DV
- does teacher experience predict the average performance of a class
L2 IVs can also be moderators of L1 DVs
Snijder and Boskers 20102 formula produces non-negative estimates:
What happens if you dont properly deal with clustered data?
Ignoring clusters between groups
- just treating as single level regression
- increased Type 1 error rate (false positive)
- N overestimated
Recognise clusters, aggregating to higher level
- changes research question to one about averages and between groups
- N reduces (as taken from higher level), and so does power, so Type 2 error rate increases
- ignores variability within class, assumes group average represents individual performance (ecological fallacy). means we could find a positive result when the opposite is true, or vice versa
What is ecological and atomistic fallacy?
Ecological fallacy is when you assume that the group’s average performance is representative of an individuals performance
Atomistic fallacy is when the individual’s performance is assumed to represent the group’s average performance (eg treating a L2 predictor as if it were an L1 variable, falsely inflating n to L1 amount)
Describe different ways of dealing appropriately with clustered data.
1) Adjustment to standard error approach
1) Adjustment to standard errors eg Huber White Sandwich estimator adjust our standard errors to recognise the level of clustering in the data (address Type 1 error inflation) equation = effect/SE - tells us whether effect is significant
the smaller the clustering effect, the closer the sample size will head towards L1 n (6)
The larger the clustering effect, the closer the sample size will head towards L2 n (2)
- Doesn’t change RQ to one of averages like aggregating to higher level does
- Less likely to find significant effect
- more accurate
Problems
–>correct differences but doesn’t allow it to be a focus
–>Doesn’t permit L2 predictors or interactions between L1 and L2 predictors