Multilevel Modelling Flashcards

1
Q

What is multilevel data?

A

Data that has clustering at different hierarchical levels e.g. different pupils in different schools which are based in different areas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are 3 types of multilevel data?

A

Geographic: – e.g data of country will have data from different regions

Relational- relationships between different factors e.g., Schools, classrooms, pupils
Countries, regions, towns, families
Company, office, team, individuals

TemporalLongitudinal: Different time points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Where do we find multilevel data?

A
  1. Naturally occurring e.g schools
  2. Due to sample design e.g. clustered, multi-stage)
  3. Repeated measures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is one way of differentiating between data that is multilevel and data that is not?

A

If can identify different hierarchies then that is a good example of multilevel data

If data is a single level with subgroups e.g. office with different ethnicities, this would not be considered multi level data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are two reasons for computing multilevel analysis?

A

1.Clustering as a problem - Sometimes we have data that is clustered and must take this into account

  1. Clustering as a substantive interest- Answer multilevel hypotheses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why can clustering be a problem?

A
  1. Clustered data violate the assumptions of simple random sampling.
  2. Traditional multiple regression techniques treat the units of analysis (e.g. individuals) as independent observations.
  3. If we fit hierarchical data with simple OLS models, (i.e. ignoring any clustering) the standard errors of our regression coefficient will be underestimated, leading to an overstatement of statistical significance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why can clustering be a problem?

A
  1. Clustered data violate the assumptions of simple random sampling.
  2. Traditional multiple regression techniques treat the units of analysis (e.g. individuals) as independent observations.
  3. If we fit hierarchical data with simple Ordinary least-squares (OLS) models, (i.e. ignoring any clustering) the standard errors of our regression coefficient will be underestimated, leading to an overstatement of statistical significance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why can clustering be a substantive interest?

A

We are often interested in estimating the amount of variation between groups, and the extent to which it can be explained by group-level explanatory variables.

E.g., Is there between-school variability in students’ academic progress?
Do health outcomes vary across areas?
Are between-area variations in health explained by differences in access to health services?
Is the amount of variation between areas different for rural and urban areas?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are alternative data analysis strategies if clustering is not of substantive interest?

A

Fixed effects regression

Design-based modelling

Adjusted standard errors - Used most often when need to adjust for potential clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

According to Richard McElreath what should multilevel regression be the default approach?

A
  1. Improved estimates for repeat sampling
  2. Improved estimates for imbalance in sampling
  3. Estimates of variation
  4. Avoid averaging, retain variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What software can be used for multilevel analysis?

A

Stata

R

Julia

Python

Stan

brms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What terms are classified under multilevel modelling?

A

Multilevel model
Random effects model
Mixed model
Random coefficient model
Random parameter models
Hierarchical model
Nested models
Split-plot designs
Subject specific models
Variance component models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is variance a measure of and how is it defined?

A

Variance is a measure of how spread out your data is.

Defined as the average of the squared differences from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can we think of a residual as?

A

As a measure of error, or how far off your predicted value was from the actual value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a single level Ordinary least-squares regression what is the Bo represented by?

A

Bo - coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a least squares regression select?

What is the value called?

A

The line with the lowest total sum of squared prediction errors.

Sum of Squares of Error, or SSE.

16
Q

What does the notation yi = B0 +B1xi + Ei mean?

A

Presents single level model.
I represents number of individuals in sample

17
Q

What does the notation yij +B0 + B1xij+ Eij mean?

A

Two level notation, outcome y for individual(s) in cluster j(ranges from 1 to however many clusters we have. If there was a three level model we would include convention k to represent this.

18
Q

With clustered data, the residuals within each cluster are likely to be correlated. What does this mean?

A

Individuals within a cluster are likely to be more similar to one another than they are to individuals from another clusters

19
Q

Explain what the formula yij = B0 + uj + Eij mean?

A

Uj represents distance from overall intercept B0 to group intercept

Eij represents distance from group level mean to observed data point at individual level

20
Q

What is meant by partitioning variance?

A

Having two sources of variance - between groups variance and within group variance

21
Q

What does we assume about epsilons?

A

They are normally distributed with a mean of 0 and variance of sigma squared e

22
Q

What does the variance partition coefficient measure?

A

Proportion of total variance
is due to differences between groups

23
Q

What is the formulae for variance partition coefficient?

A

Between group variance divided by total variance(between group variance plus within group variance)

24
Q

What is the variance partition coefficient sometimes but not always equivalent to?

A

Intraclass correlation coefficient

25
Q

What are two caveats of fitting a two level model and what is a possible solution?

A
  • The random intercepts model introduced additional complexity.
  • Did it result in a significant improvement in model fit? - We can test this with a likelihood-ratio test
26
Q

What does a likelihood ratio test examine?

A

Group effects

27
Q

What is the formulae for a likelihood ratio test?

A

D = -2 Log L2 (Fit of multilevel model) - -2 Log L1(Fit of single level model)

The deviance statistic (D) is compared against a X2 distribution with D.F. degrees of freedom.

28
Q

What is the degrees of freedom?

A

The difference in the number of parameters in the multilevel vs. individual level model).