SEM- structural equation modelling Flashcards
What is path analysis?
• Path analysis is a very simple form of Structural Equation Modeling (SEM).
• We would typically use the term ‘path analysis’ when we are modeling observed variables.
• This means we have a single measure of the construct e.g. word vocabulary test.
• More often referred to as SEM when we have multiple indicators of a construct and we create latent variables
• In its most basic form, path analysis is a simple extension of multiple regression.
• Path analysis is typically used to:
o Examine the size and direction of direct and indirect effects between multiple variables
o Examine the goodness of model fit between the researcher’s hypothesised model and the observed data
o Compare the observed model fit of competing theoretical models
Software used for SEM?
AMOS: simplest program to begin with; has a graphical module which allows relatively easy specification of models
What is a mediated multiple regression?
- In a mediation model, the relationship between an IV and outcome is accounted for or ‘mediated’ by a third variable i.e. a mediator variable.
- Mediation implies a ‘causal chain’ series of relationships between the three variables i.e. IV – Mediator - DV.
- The researcher must have clear theoretical or logical grounds for choosing the mediator and IV variables.
What are the requirements for a mediation?
- predictor (X) must predict mediator (Z)
- mediator (Z) must predict criterion (Y)
- predictor (X) must predict criterion (Y)
- the X,Y relationship must ‘shrink’ in the presence of Z
How to assess mediation?
- The predictor –> outcome beta weight should be 0 (or at least nonsignificant) for full mediation when the mediator is in the model, i.e. the relationship between IV and DV should be fully accounted for by the indirect effect via the mediator.
- Researchers often make a case for partial mediation if the beta weight drops substantively but does not reach 0.
- Sobel test of the indirect effect (more on this later).
What are the advantages of SEM over running multiple ordinary least squares regressions?
• While we can run a series of OLS (ordinary least squares) regression models to examine structural path models, the analysis can become very complex when testing a large number of effects.
• Modern SEM software programs use maximum likelihood based methods to calculate effects simultaneously.
o Simpler and quicker estimation of model effects
o Obtain global model fit indices that can confirm or disconfirm whether your model fits the data.
o Encourage researcher to specify causal relations between variables beforehand
o More direct and easier tests of alternative theoretical models (model trimming and building) and their fit
o Parameter estimates are better estimated in one go if possible, rather than estimating in multiple steps as bias is introduced in unnecessary multiple step estimation.
Steps in path analysis
- Specify the model
- Model identification
- Model estimation
- Evaluate model fit
- Interpret model effects
- Modifying the model
• Examining alternative model
How should you specify a path model
- You should use theory and/or previous research, as well as logical relations between variables, to justify your path model
- (so this is a confirmatory rather than an exploratory technique).
Circles, squares, single headed arrows and douuble headed arrows in paths daigrams
o In path diagrams, observed variables are typically represented by squares. o Latent (unobserved) variables are typically represented by circles (or an ellipse). o Single-headed arrows represent causal relationships between variables. o Curve double headed arrows represent correlations between two or more exogenous variables.
What are recursive and non-recursive models?
o Recursive models
Models where all causal pathways are moving in the same direction i.e. effects are uni-directional.
The most common form of model
Always identified
o Non-recursive models
Models where there are reciprocal relationships between variables (not referring to correlations!)
complex to analyse
Identification issues can be very problematic in complex nonrecursive models
Not as common in the psychology literature
What are exogenous variables?
These variables are considered as IVs in the model
They have no specified predicted cause in the model, hence they have no single-headed arrow input.
You can have multiple exogenous variables in the model; these are usually free to correlate with each other, although you can specify that they be uncorrelated.
Correlations between two or more exogenous variables are represented by a curved double-headed arrow between variables.
What are endogenous variables?
o Endogenous variables
These variables are considered DVs in the model Will have a directional arrow coming in, and may also have one or more directional arrows moving away if it is a mediator variable.
Downstream variables caused by the exogenous variables
Each endogenous variable will also typically have an error or disturbance term associated with it.
This reflects that there are also unmeasured and unspecified causal effects on these variables
What are latent variables?
o Latent variables
Each endogenous variable will also typically have an error or disturbance term associated with it.
This reflects that there are also unmeasured and unspecified causal effects on these variables
These disturbance terms are usually modeled as latent variables, hence they are represented by circles
What is model identification?
- In SEM a model is specified, then parameters (variances and covariances of IVs and regression coefficients) for the model are estimated using sample data, and the parameters are used to produce the estimated population covariance matrix.
- However, in order to be estimated, a path model must be ‘identified’.
- This means there needs to be sufficient unique pieces of information (i.e. correlations in the observed data) to allow mathematical estimation of the model that has been specified.
- A model is said to be identified if it possible to estimate each of the unknown parameters i.e. there must be more known than unkown parameters
How can you check model identification for observed variable path models?
o Calculate all possible pathways between variables (this is number of ‘data points’ in the SEM, since data points in SEM are the number of non-redundant samples variances and co-variances).
o Simple formula: (v * v+1)/2 where v = no. of variables
o Count all of the model pathways
o Compare these two numbers
o This number that you get is actually the df of the model
o Basic rule: Maximum number of possible pathways between observed variables must equal to or exceed the number of paths specified (drawn/included) in the model. This is the same as saying that df must be more than or equal to 0.
When explaining, always state what just identified, over identified and under identified mean.
What is model estimation?
1) Once you have specified and constructed your model, you are ready to estimate your model
2) We are primarily interested in two facets of the model:
1) The direct and indirect effects between variables
2) Global model fit
(The analogue to OLS regression would be interest in: 1) Regression coefficients for individual predictors 2) Test of overall regression model fit i.e. ANOVA for R2)
What are the direct effects in a path model?
o Can test the significance of (unstandardised) direct effects
o Should consider the magnitude of direct effects, not just significance- Use past research as a guide
o Use rules of thumb e.g. Cohen: .10 small, 0.30 medium, 0.50 large.
o The path regression coefficients (are the standardised coefficients) reflect direct relations between one variable and another (controlling for the effect of any other variable also effecting the endogenous variable).
o These are the same as beta weights in normal MR We can obtain these by simply running separate OLS regression models.
What are the indirect effects in a path model?
o Effects of one variable on another variable via a mediator variable.
o In a standard one-mediator mediated regression there is one indirect effect – the effect of the IV on the DV via the mediator.
o The strength of an indirect effect is obtained by multiplying the constituent direct paths (or the two direct paths that make up the indirect path) i.e. N–> void = .34, avoid –> dep is .5. So indirect effect of n–> dep via avoid is .34*.5 = .17
How can we fully interpret indirect effects in a path model?
Imagine an indirect effect of Neuroticism on Depression via avoid is (0.34 * 0.50) = 0.17.
What this means is that N has a 0.34 direct effect on Avoid, but only 0.5 of this is transmitted to Dep via Avoid i.e. 0.17.
This means we can expect an increase in Dep of 0.17 SD units for every 1 SD unit increase in N, via the effects on Avoid.
We would also mention how much the direct effect shrinks when indirect is taken is taken into account, is it a full or partial mediation?
Can be difficult to calculate statistical tests with two or more mediators
What is the sobel test
The Sobel test is often used to test the significance of the indirect effect with one mediator
a z test on the ratio of the unstandardised indrect effect to its standard error, only useful with fairly large samples
How do we calculate total effects in a path model?
o Total effects represent the total causal effect of one variable on another.
o This is calculated by summing all of the direct and indirect effects
o In our earlier example: the total effect of N on Dep
= direct effect + indirect effect
= 0.22 + 0.17 = 0.39
o Total effect of Avoid on Dep is simply -0.03, as there are no indirect effects in this pathway.
(If you have AMOS output, you also get a total effects table here)
Where are the unstandardised regression weight in AMOS output?
o Regression weights table ‘estimate’ column give you unstandardized regression weights, corresponding to ‘B’ column in spss