Path Analysis Flashcards
Diagram conventions
Square
□
= observed / measured variables
Diagram conventions
Circle
◯
= latent / unobserved variables
Diagram conventions
Double-headed arrow
↔
= covariance
Diagram conventions
Single-headed arrow
→
= regression path
What issues do path models solve?
A path model allows us to test several linear models together as a set ( = multiple non-nested equations)
They are based on the correlation matrix of the measured variables in your study
Exogenous variables
have direct arrows going out → but none going in
they are essentially independent variables
- nothing effects these variables
- they effect outcomes
I like to remember them as they’re an ex and no one likes them so no one goes in but they’re chasing people
Endogenous variables
have direct arrows going in ← (can also have them going out)
they are dependent variables in at least one part of the model (hence the arrow going in)
- they predict something but can also be predicted by something else
in linear models there is only one endogenous variable but in path models we can have multiple
Endogeneity bias
= a hidden variables we haven’t accounted for that still effects our study
e.g. leaving out a measure of intelligence on a study of school test scores
Basic structure of a path modelling
Input of path models (study results)
↓
Correlation matrix
↓
Define a model that explains the relationship
↓
How well can our model reproduce the observed correlation matrix?
Lavaan
Latent variable analysis
this is the package in R which we use to fit path models (it has sensible defaults so most of the time we just give it our specified model and our dataset)
it requires 3 steps:
1) specify the model and create a model object
2) run the model using sem() function
3) evaluate the model
Lavaan
Model statements
observed variable = use the name given in the dataset
latent variable = give a new name
covariance = use ~~
regression path = use ~
Model specification
What is specification?
Specification concerns which variables relate to which others and in what ways
it is also where we formally set out our theory and hypothesis
for path analysis this is where we outline our model and then use the sem() function
This means basically writing down the paths that are included in your theoretical model.
Model specification
Path model standard rules
1) all exogenous variables correlate
2) for endogenous variables, we correlate the residuals, not the variables
3) endogenous variable residuals do NOT correlate with exogenous variables (we hope)
4) all paths are recursive (i.e. we can’t have loops)
Model identification
what is identification?
Identification concerns the number of knowns vs the number of unknowns ( = degrees of freedom)
Model identification
The Knowns
- variances of measured variables
- covariances between the variables
- the unique values in a correlation matrix
- in the correlation matrix this is the values on the diagonal and below
Model identification
The Unknowns
- the parameters we want to estimate
= all the lines we include in our diagram
= the variances of all variables (estimated), covariances and regression paths
Model identification
Degrees of Freedom (of path models)
= difference between the knowns and unknowns
df must be positive = we must have more knowns than unknowns = meaning our model simplifies our data
Model identification
t-rule
Used to calculate the knowns :
[ k * (k+1) ] / 2
Where:
k = number of observed variables
e.g. k = 5
[ 5 * (5+1)] / 2 = (5*6)/2 = 30/2 = 15 knowns
Model identification
levels of identification
Under identified models
Have <0 df
Model identification
levels of identification
Just identified models
Have 0 df
- all standard lms are just identified
Model identification
levels of identification
Over identified models
Have >0 df
= some flexibility to estimate parameters
Model Estimation
estimating path models
Model estimation = ‘best’ values for unknown parameters
path model estimation = finds values for parameters that minimise the difference between the observed correlation matrix and the model correlation matrix
maximum likelihood estimation is the most common method used
- it is an iterative process that terminates when altering model values no longer improves the model = convergence has been reached
- if the model fails to converge follow the same steps as MLM
Model Evaluation
If a simplified model can reproduce the relationships in the data, it is a good model
comparing the observed correlation matrix with the model implied correlation matrix is key to evaluating how good our model is
Model evaluation
path tracing
path tracing = when we specify a model, we use the parameter estimates to recalculate the correlations/covariances