Path Analysis Flashcards

1
Q

Diagram conventions
Square

A

= observed / measured variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Diagram conventions
Circle

A

= latent / unobserved variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Diagram conventions
Double-headed arrow

A

= covariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Diagram conventions
Single-headed arrow

A

= regression path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What issues do path models solve?

A

A path model allows us to test several linear models together as a set ( = multiple non-nested equations)

They are based on the correlation matrix of the measured variables in your study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Exogenous variables

A

have direct arrows going out → but none going in

they are essentially independent variables
- nothing effects these variables
- they effect outcomes

I like to remember them as they’re an ex and no one likes them so no one goes in but they’re chasing people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Endogenous variables

A

have direct arrows going in ← (can also have them going out)

they are dependent variables in at least one part of the model (hence the arrow going in)
- they predict something but can also be predicted by something else

in linear models there is only one endogenous variable but in path models we can have multiple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Endogeneity bias

A

= a hidden variables we haven’t accounted for that still effects our study
e.g. leaving out a measure of intelligence on a study of school test scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Basic structure of a path modelling

A

Input of path models (study results)

Correlation matrix

Define a model that explains the relationship

How well can our model reproduce the observed correlation matrix?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Lavaan

A

Latent variable analysis
this is the package in R which we use to fit path models (it has sensible defaults so most of the time we just give it our specified model and our dataset)

it requires 3 steps:
1) specify the model and create a model object
2) run the model using sem() function
3) evaluate the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Lavaan

Model statements

A

observed variable = use the name given in the dataset

latent variable = give a new name

covariance = use ~~

regression path = use ~

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Model specification

What is specification?

A

Specification concerns which variables relate to which others and in what ways

it is also where we formally set out our theory and hypothesis

for path analysis this is where we outline our model and then use the sem() function
This means basically writing down the paths that are included in your theoretical model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Model specification

Path model standard rules

A

1) all exogenous variables correlate

2) for endogenous variables, we correlate the residuals, not the variables

3) endogenous variable residuals do NOT correlate with exogenous variables (we hope)

4) all paths are recursive (i.e. we can’t have loops)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Model identification

what is identification?

A

Identification concerns the number of knowns vs the number of unknowns ( = degrees of freedom)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Model identification

The Knowns

A
  • variances of measured variables
  • covariances between the variables
    - the unique values in a correlation matrix
    - in the correlation matrix this is the values on the diagonal and below
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Model identification

The Unknowns

A
  • the parameters we want to estimate
    = all the lines we include in our diagram
    = the variances of all variables (estimated), covariances and regression paths
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Model identification

Degrees of Freedom (of path models)

A

= difference between the knowns and unknowns

df must be positive = we must have more knowns than unknowns = meaning our model simplifies our data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Model identification

t-rule

A

Used to calculate the knowns :

[ k * (k+1) ] / 2
Where:
k = number of observed variables

e.g. k = 5
[ 5 * (5+1)] / 2 = (5*6)/2 = 30/2 = 15 knowns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Model identification

levels of identification
Under identified models

A

Have <0 df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Model identification

levels of identification
Just identified models

A

Have 0 df

  • all standard lms are just identified
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Model identification

levels of identification
Over identified models

A

Have >0 df

= some flexibility to estimate parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Model Estimation

estimating path models

A

Model estimation = ‘best’ values for unknown parameters

path model estimation = finds values for parameters that minimise the difference between the observed correlation matrix and the model correlation matrix

maximum likelihood estimation is the most common method used
- it is an iterative process that terminates when altering model values no longer improves the model = convergence has been reached
- if the model fails to converge follow the same steps as MLM

23
Q

Model Evaluation

A

If a simplified model can reproduce the relationships in the data, it is a good model

comparing the observed correlation matrix with the model implied correlation matrix is key to evaluating how good our model is

24
Q

Model evaluation

path tracing

A

path tracing = when we specify a model, we use the parameter estimates to recalculate the correlations/covariances

25
Model fit in path models
in path models we tend not to focus on variance explained in the outcome (as we would for MLM) instead we ask does our model fit the data? if so, what are the parameter estimates? 'fitting the data' refers to how well our model implied correlation matrix reproduces the observed correlation - if it does this well = it fits (but this is a continuum so some fit better than others) just-identified models will always fit perfectly If we have positive df we can calculate model fit indices
26
Model fit model fit indices Global Fit (chi squared)
Statistically significant chi squared = POOR FIT when we use MLE we obtain a chi squared value for the model which can be compared to a chi squared distribution with the same dfs as our model to determine significance BUT this does not work well in practice as it leads to the rejection of models that are only trivially mis-specified
27
Model fit model fit indices Absolute fit (SRMR)
values <0.5 = GOOD FIT SRMR = standardised root mean-squared residual measures the discrepancy between observed correlation matrix and model implied ranges from 1 (terrible fit) to 0 (perfect fit) which is stupid and confusing
28
Model fit model fit indices Parsimony Corrected (RMSEA)
values <0.5 = good fit RMSEA = root mean-squared error of approximation this corrects for the complexity of the model and rewards simpler models by adding a penalty for more dfs ranges from 1 (terrible fit) to 0 (perfect fit) which is stupid and confusing
29
Model fit model fit indices Incremental fit indices
Comparative fit index = >0.95 = good fit - ranges from 0 to 1 where 1 = perfect fit Tucker-Lewis index (TLI) = >0.95 = good fit - includes a parsimony correction Compares the model to a more restricted baseline model - usually an 'independence' model where all observed variable covariances are fixed to 0
30
Model fit model fit indices Local Fit
it is possible to examine local areas of mis-fit Modification indices = estimate the improvement in chi squared that could be expected from including an additional parameter Expected parameter changes = estimates the value of the parameter, were it to be included
31
Model modifications
= they indicate how much your model would improve if you added a path to your model modification indices and expected parameter changes can be helpful for identifying how to improve a model but this is purely EXPLORATORY they can be extracted in R using: modindicies(model) HOWEVER: - modifications should be done iteratively - they might just be capitalising on chance - must ensure modifications can be justified - ideally, we would need to replicate the new model in an independent sample
32
Interpreting path models
If our specified model fits the data, we can interpret the parameter estimates Recall these are just correlation and regression paths so we interpret them the same way we would r and β coefficients
33
What is mediation?
Mediation is when a predictor X has an effect on outcome Y via the mediating variable M The mediator transmits the effect of X to Y In reality there is no such thing as direct effects - everything occurs via mediation e.g. - anxiety (X) decreases physical health (Y) due to lack of sleep (M)
34
Path model mediation
traditional roles of mediation were based on comparing across linear models but these suffer from low power and are very cumbersome path model mediation is better than traditional methods but should only really be used with longitudinal data as mediation occurs over time
35
Path model mediation (on cross-sectional data) Indistinguishable models
mediation is possible to do on cross-sectional data but there is a big conceptual problem: we are modelling correlations → cross-sectional data means we have multiple indistinguishable models → so there is nothing to demonstrate whether one model is better than another
36
What is moderation?
moderation is when a moderator z modifies the effect of x on y - e.g. the effect of x on y is higher at stronger levels of z - also known as an interaction between x and z
37
Path Mediation what are total effects?
= the overall effect of a predictor on the outcome is known as the total effect total effect = indirect + direct effect They can be interpreted as: the unit increase in Y expected to occur when X increases by one unit
38
Path Mediation what are direct effects?
= The effect of x on y (NOT via the mediator) In a path model it would look like this: X → Y They can be interpreted as: the unit increase in Y expected to occur with a unit increase in X over and above the increase transmitted by M NOTE: the direct effect may not be direct in real life - they could be effected by other mediators we haven't included in our model
39
Path Mediation what are indirect effects?
= the effects of X on Y transmitted VIA the mediator To estimate indirect effects we multiply the paths ( X → M) by ( M → Y) They can be interpreted as: the unit increase in Y expected to occur via M when X increases by one unit
40
Path mediation Testing Mediation
Demonstrating mediation will usually rely on: - evaluating the significance of direct, total and indirect effects - considering the proportion of the total effects which is due to the mediated path Proportion mediated = indirect / total
41
Path Mediation Testing a path mediation model in lavaan
1) Specification = create a lavaan syntax object 2) Estimation = e.g. using maximum likelihood 3) Evaluation / interpretation = inspect the model to judge how good it is = interpret the parameter estimates We constrain some of the paths in our model to 0 ( saying there's no variance) so we can test how well our model predicts our observed correlation matrix given restricted paths e.g basically, pick 2 arrows on the diagram see how well they predict, pick different arrows see if they predict better etc. - we can choose specific paths to answer specific RSQs
42
Path Mediation coding effects
to calculate the indirect effects of X on Y in path mediation, we first need to create some new parameters We label these from our path model: a = regression coefficient for M ~ X b = regression coefficient for Y ~ M c = regression coefficient for Y ~ X In r we then use := to create a new parameter e.g. indirect := a*b total := (a*b) + c
43
Path Mediation Model evaluation
We want to see: - model estimates - model fit - standardised solutions - (possibly modification indicies)
44
Path Mediation Model Output
Things to note: 1) significant effects = look at p-values 2) degrees of freedom = if they are positive we can assess model fit
45
Path Mediation Significance of Indirect effects
As indirect effects are estimated from parameters instead of the data, we can not calculate the standard errors Default method of assessing statistical significance of indirect effects is we assume a normal sampling distribution BUT this may not hold up for indirect effects that are the product of regression coefficients instead we use bootstrapping (if 95% CI includes 0, indirect effect is not significant at 0.05 sig level)
46
Path Mediation Significance of Indirect effects Bootstrapping CIs in lavaan
1) run the model - using " se = 'bootstrap' " 2) view the output with CIs 3) (if needed) standardise parameters (e.g. if measurements don't have easy interpretations) - using "std = T"
47
What if the model doesn't fit?
REMEMBER the goal is not to achieve model fit if model fit is poor we should not draw substantive conclusions from it but we can assess why fit is poor.
48
Path mediation Model modification
you may want to modify your initially hypothesised model e.g. non-significant paths to remove, include some other paths etc. BUT as soon as we make a modification we are no longer testing the model in a confirmatory way Our analysis shifts to being led by the data rather than theory and this is not preferred Similar to MLM modification
49
Reporting path mediation models Method/analysis strategy
mention: * the model being tested e.g. Y was regressed on both X and M, and M was regressed on X * the estimator used e.g. maximum likelihood * the method used to test significance of indirect effects e.g bootstrapped 95% CIs
50
Reporting path mediation models Results
*model fit (for over identified models) *parameter estimates for path mediation and their statistical significance - can be useful to present in a SEM diagram BUT the diagrams in R are not considered publication quality
51
Reporting path mediation models SEM diagram
* include key parameter estimates * include statistically significant paths (indicated with *) - basically just add numbers to our path diagram *include figure note that explains how statistically significant paths are identified and at what level
52
Reporting path mediation models Visualising the model
There are a number of R packages that will produce path diagrams BUT the presentation of these is not always clear and it can be difficult to refine them John used powerpoint to make the diagrams
53
Reporting path mediation models The indirect effects
* results = the coefficients for the indirect effect (significate, direction of effect +/- etc.) and the bootstrapped 95% CIs * common to report proportion mediated - but this should always be interpreted within the context of the study * interpretation can be tricky if there's a mix of + and - effects involved
54
Other path analysis models
Anything that can be expressed in terms of regressions between observed variables can be tested as a path model - can include ordinal and binary data - can include moderation