Path Analysis Flashcards

1
Q

Diagram conventions
Square

A

= observed / measured variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Diagram conventions
Circle

A

= latent / unobserved variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Diagram conventions
Double-headed arrow

A

= covariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Diagram conventions
Single-headed arrow

A

= regression path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What issues do path models solve?

A

A path model allows us to test several linear models together as a set ( = multiple non-nested equations)

They are based on the correlation matrix of the measured variables in your study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Exogenous variables

A

have direct arrows going out → but none going in

they are essentially independent variables
- nothing effects these variables
- they effect outcomes

I like to remember them as they’re an ex and no one likes them so no one goes in but they’re chasing people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Endogenous variables

A

have direct arrows going in ← (can also have them going out)

they are dependent variables in at least one part of the model (hence the arrow going in)
- they predict something but can also be predicted by something else

in linear models there is only one endogenous variable but in path models we can have multiple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Endogeneity bias

A

= a hidden variables we haven’t accounted for that still effects our study
e.g. leaving out a measure of intelligence on a study of school test scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Basic structure of a path modelling

A

Input of path models (study results)

Correlation matrix

Define a model that explains the relationship

How well can our model reproduce the observed correlation matrix?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Lavaan

A

Latent variable analysis
this is the package in R which we use to fit path models (it has sensible defaults so most of the time we just give it our specified model and our dataset)

it requires 3 steps:
1) specify the model and create a model object
2) run the model using sem() function
3) evaluate the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Lavaan

Model statements

A

observed variable = use the name given in the dataset

latent variable = give a new name

covariance = use ~~

regression path = use ~

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Model specification

What is specification?

A

Specification concerns which variables relate to which others and in what ways

it is also where we formally set out our theory and hypothesis

for path analysis this is where we outline our model and then use the sem() function
This means basically writing down the paths that are included in your theoretical model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Model specification

Path model standard rules

A

1) all exogenous variables correlate

2) for endogenous variables, we correlate the residuals, not the variables

3) endogenous variable residuals do NOT correlate with exogenous variables (we hope)

4) all paths are recursive (i.e. we can’t have loops)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Model identification

what is identification?

A

Identification concerns the number of knowns vs the number of unknowns ( = degrees of freedom)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Model identification

The Knowns

A
  • variances of measured variables
  • covariances between the variables
    - the unique values in a correlation matrix
    - in the correlation matrix this is the values on the diagonal and below
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Model identification

The Unknowns

A
  • the parameters we want to estimate
    = all the lines we include in our diagram
    = the variances of all variables (estimated), covariances and regression paths
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Model identification

Degrees of Freedom (of path models)

A

= difference between the knowns and unknowns

df must be positive = we must have more knowns than unknowns = meaning our model simplifies our data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Model identification

t-rule

A

Used to calculate the knowns :

[ k * (k+1) ] / 2
Where:
k = number of observed variables

e.g. k = 5
[ 5 * (5+1)] / 2 = (5*6)/2 = 30/2 = 15 knowns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Model identification

levels of identification
Under identified models

A

Have <0 df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Model identification

levels of identification
Just identified models

A

Have 0 df

  • all standard lms are just identified
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Model identification

levels of identification
Over identified models

A

Have >0 df

= some flexibility to estimate parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Model Estimation

estimating path models

A

Model estimation = ‘best’ values for unknown parameters

path model estimation = finds values for parameters that minimise the difference between the observed correlation matrix and the model correlation matrix

maximum likelihood estimation is the most common method used
- it is an iterative process that terminates when altering model values no longer improves the model = convergence has been reached
- if the model fails to converge follow the same steps as MLM

23
Q

Model Evaluation

A

If a simplified model can reproduce the relationships in the data, it is a good model

comparing the observed correlation matrix with the model implied correlation matrix is key to evaluating how good our model is

24
Q

Model evaluation

path tracing

A

path tracing = when we specify a model, we use the parameter estimates to recalculate the correlations/covariances

25
Q

Model fit

in path models

A

in path models we tend not to focus on variance explained in the outcome (as we would for MLM)

instead we ask does our model fit the data? if so, what are the parameter estimates?

‘fitting the data’ refers to how well our model implied correlation matrix reproduces the observed correlation
- if it does this well = it fits (but this is a continuum so some fit better than others)

just-identified models will always fit perfectly

If we have positive df we can calculate model fit indices

26
Q

Model fit

model fit indices

Global Fit (chi squared)

A

Statistically significant chi squared = POOR FIT

when we use MLE we obtain a chi squared value for the model which can be compared to a chi squared distribution with the same dfs as our model to determine significance

BUT this does not work well in practice as it leads to the rejection of models that are only trivially mis-specified

27
Q

Model fit

model fit indices

Absolute fit (SRMR)

A

values <0.5 = GOOD FIT

SRMR = standardised root mean-squared residual

measures the discrepancy between observed correlation matrix and model implied

ranges from 1 (terrible fit) to 0 (perfect fit) which is stupid and confusing

28
Q

Model fit

model fit indices

Parsimony Corrected (RMSEA)

A

values <0.5 = good fit

RMSEA = root mean-squared error of approximation

this corrects for the complexity of the model and rewards simpler models by adding a penalty for more dfs

ranges from 1 (terrible fit) to 0 (perfect fit) which is stupid and confusing

29
Q

Model fit

model fit indices

Incremental fit indices

A

Comparative fit index = >0.95 = good fit
- ranges from 0 to 1 where 1 = perfect fit

Tucker-Lewis index (TLI) = >0.95 = good fit
- includes a parsimony correction

Compares the model to a more restricted baseline model - usually an ‘independence’ model where all observed variable covariances are fixed to 0

30
Q

Model fit

model fit indices

Local Fit

A

it is possible to examine local areas of mis-fit

Modification indices = estimate the improvement in chi squared that could be expected from including an additional parameter

Expected parameter changes = estimates the value of the parameter, were it to be included

31
Q

Model modifications

A

= they indicate how much your model would improve if you added a path to your model

modification indices and expected parameter changes can be helpful for identifying how to improve a model but this is purely EXPLORATORY

they can be extracted in R using:
modindicies(model)

HOWEVER:
- modifications should be done iteratively
- they might just be capitalising on chance
- must ensure modifications can be justified
- ideally, we would need to replicate the new model in an independent sample

32
Q

Interpreting path models

A

If our specified model fits the data, we can interpret the parameter estimates

Recall these are just correlation and regression paths so we interpret them the same way we would r and β coefficients

33
Q

What is mediation?

A

Mediation is when a predictor X has an effect on outcome Y via the mediating variable M

The mediator transmits the effect of X to Y

In reality there is no such thing as direct effects - everything occurs via mediation

e.g.
- anxiety (X) decreases physical health (Y) due to lack of sleep (M)

34
Q

Path model mediation

A

traditional roles of mediation were based on comparing across linear models but these suffer from low power and are very cumbersome

path model mediation is better than traditional methods but should only really be used with longitudinal data as mediation occurs over time

35
Q

Path model mediation (on cross-sectional data)

Indistinguishable models

A

mediation is possible to do on cross-sectional data but there is a big conceptual problem:

we are modelling correlations → cross-sectional data means we have multiple indistinguishable models → so there is nothing to demonstrate whether one model is better than another

36
Q

What is moderation?

A

moderation is when a moderator z modifies the effect of x on y
- e.g. the effect of x on y is higher at stronger levels of z
- also known as an interaction between x and z

37
Q

Path Mediation

what are total effects?

A

= the overall effect of a predictor on the outcome is known as the total effect

total effect = indirect + direct effect

They can be interpreted as:
the unit increase in Y expected to occur when X increases by one unit

38
Q

Path Mediation

what are direct effects?

A

= The effect of x on y (NOT via the mediator)

In a path model it would look like this:
X → Y

They can be interpreted as:
the unit increase in Y expected to occur with a unit increase in X over and above the increase transmitted by M

NOTE: the direct effect may not be direct in real life - they could be effected by other mediators we haven’t included in our model

39
Q

Path Mediation

what are indirect effects?

A

= the effects of X on Y transmitted VIA the mediator

To estimate indirect effects we multiply the paths
( X → M) by ( M → Y)

They can be interpreted as:
the unit increase in Y expected to occur via M when X increases by one unit

40
Q

Path mediation

Testing Mediation

A

Demonstrating mediation will usually rely on:

  • evaluating the significance of direct, total and indirect effects
  • considering the proportion of the total effects which is due to the mediated path
    Proportion mediated = indirect / total
41
Q

Path Mediation

Testing a path mediation model in lavaan

A

1) Specification
= create a lavaan syntax object

2) Estimation
= e.g. using maximum likelihood

3) Evaluation / interpretation
= inspect the model to judge how good it is
= interpret the parameter estimates

We constrain some of the paths in our model to 0 ( saying there’s no variance) so we can test how well our model predicts our observed correlation matrix given restricted paths
e.g basically, pick 2 arrows on the diagram see how well they predict, pick different arrows see if they predict better etc.
- we can choose specific paths to answer specific RSQs

42
Q

Path Mediation

coding effects

A

to calculate the indirect effects of X on Y in path mediation, we first need to create some new parameters

We label these from our path model:
a = regression coefficient for M ~ X
b = regression coefficient for Y ~ M
c = regression coefficient for Y ~ X

In r we then use := to create a new parameter e.g.
indirect := ab
total := (a
b) + c

43
Q

Path Mediation

Model evaluation

A

We want to see:
- model estimates
- model fit
- standardised solutions
- (possibly modification indicies)

44
Q

Path Mediation

Model Output

A

Things to note:

1) significant effects = look at p-values

2) degrees of freedom = if they are positive we can assess model fit

45
Q

Path Mediation

Significance of Indirect effects

A

As indirect effects are estimated from parameters instead of the data, we can not calculate the standard errors

Default method of assessing statistical significance of indirect effects is we assume a normal sampling distribution
BUT this may not hold up for indirect effects that are the product of regression coefficients

instead we use bootstrapping (if 95% CI includes 0, indirect effect is not significant at 0.05 sig level)

46
Q

Path Mediation

Significance of Indirect effects
Bootstrapping CIs in lavaan

A

1) run the model
- using “ se = ‘bootstrap’ “

2) view the output with CIs

3) (if needed) standardise parameters (e.g. if measurements don’t have easy interpretations)
- using “std = T”

47
Q

What if the model doesn’t fit?

A

REMEMBER the goal is not to achieve model fit

if model fit is poor we should not draw substantive conclusions from it but we can assess why fit is poor.

48
Q

Path mediation

Model modification

A

you may want to modify your initially hypothesised model e.g. non-significant paths to remove, include some other paths etc.
BUT as soon as we make a modification we are no longer testing the model in a confirmatory way
Our analysis shifts to being led by the data rather than theory and this is not preferred

Similar to MLM modification

49
Q

Reporting path mediation models

Method/analysis strategy

A

mention:

  • the model being tested
    e.g. Y was regressed on both X and M, and M was regressed on X
  • the estimator used
    e.g. maximum likelihood
  • the method used to test significance of indirect effects
    e.g bootstrapped 95% CIs
50
Q

Reporting path mediation models

Results

A

*model fit (for over identified models)

*parameter estimates for path mediation and their statistical significance
- can be useful to present in a SEM diagram BUT the diagrams in R are not considered publication quality

51
Q

Reporting path mediation models

SEM diagram

A
  • include key parameter estimates
  • include statistically significant paths (indicated with *)
  • basically just add numbers to our path diagram

*include figure note that explains how statistically significant paths are identified and at what level

52
Q

Reporting path mediation models

Visualising the model

A

There are a number of R packages that will produce path diagrams
BUT the presentation of these is not always clear and it can be difficult to refine them

John used powerpoint to make the diagrams

53
Q

Reporting path mediation models

The indirect effects

A
  • results = the coefficients for the indirect effect (significate, direction of effect +/- etc.) and the bootstrapped 95% CIs
  • common to report proportion mediated - but this should always be interpreted within the context of the study
  • interpretation can be tricky if there’s a mix of + and - effects involved
54
Q

Other path analysis models

A

Anything that can be expressed in terms of regressions between observed variables can be tested as a path model
- can include ordinal and binary data
- can include moderation