SEM- structural equation modelling Flashcards

1
Q

What is path analysis?

A

• Path analysis is a very simple form of Structural Equation Modeling (SEM).
• We would typically use the term ‘path analysis’ when we are modeling observed variables.
• This means we have a single measure of the construct e.g. word vocabulary test.
• More often referred to as SEM when we have multiple indicators of a construct and we create latent variables
• In its most basic form, path analysis is a simple extension of multiple regression.
• Path analysis is typically used to:
o Examine the size and direction of direct and indirect effects between multiple variables
o Examine the goodness of model fit between the researcher’s hypothesised model and the observed data
o Compare the observed model fit of competing theoretical models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Software used for SEM?

A

AMOS: simplest program to begin with; has a graphical module which allows relatively easy specification of models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a mediated multiple regression?

A
  • In a mediation model, the relationship between an IV and outcome is accounted for or ‘mediated’ by a third variable i.e. a mediator variable.
  • Mediation implies a ‘causal chain’ series of relationships between the three variables i.e. IV – Mediator - DV.
  • The researcher must have clear theoretical or logical grounds for choosing the mediator and IV variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the requirements for a mediation?

A
  1. predictor (X) must predict mediator (Z)
  2. mediator (Z) must predict criterion (Y)
  3. predictor (X) must predict criterion (Y)
  4. the X,Y relationship must ‘shrink’ in the presence of Z
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to assess mediation?

A
  • The predictor –> outcome beta weight should be 0 (or at least nonsignificant) for full mediation when the mediator is in the model, i.e. the relationship between IV and DV should be fully accounted for by the indirect effect via the mediator.
  • Researchers often make a case for partial mediation if the beta weight drops substantively but does not reach 0.
  • Sobel test of the indirect effect (more on this later).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the advantages of SEM over running multiple ordinary least squares regressions?

A

• While we can run a series of OLS (ordinary least squares) regression models to examine structural path models, the analysis can become very complex when testing a large number of effects.
• Modern SEM software programs use maximum likelihood based methods to calculate effects simultaneously.
o Simpler and quicker estimation of model effects
o Obtain global model fit indices that can confirm or disconfirm whether your model fits the data.
o Encourage researcher to specify causal relations between variables beforehand
o More direct and easier tests of alternative theoretical models (model trimming and building) and their fit
o Parameter estimates are better estimated in one go if possible, rather than estimating in multiple steps as bias is introduced in unnecessary multiple step estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Steps in path analysis

A
  1. Specify the model
  2. Model identification
  3. Model estimation
  4. Evaluate model fit
  5. Interpret model effects
  6. Modifying the model
    • Examining alternative model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How should you specify a path model

A
  • You should use theory and/or previous research, as well as logical relations between variables, to justify your path model
  • (so this is a confirmatory rather than an exploratory technique).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Circles, squares, single headed arrows and douuble headed arrows in paths daigrams

A
o	In path diagrams, observed variables are typically represented by squares.
o	Latent (unobserved) variables are typically represented by circles (or an ellipse).
o	Single-headed arrows represent causal relationships between variables.
o	Curve double headed arrows represent correlations between two or more exogenous variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are recursive and non-recursive models?

A

o Recursive models
 Models where all causal pathways are moving in the same direction i.e. effects are uni-directional.
 The most common form of model
 Always identified
o Non-recursive models
 Models where there are reciprocal relationships between variables (not referring to correlations!)
 complex to analyse
 Identification issues can be very problematic in complex nonrecursive models
 Not as common in the psychology literature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are exogenous variables?

A

 These variables are considered as IVs in the model
 They have no specified predicted cause in the model, hence they have no single-headed arrow input.
 You can have multiple exogenous variables in the model; these are usually free to correlate with each other, although you can specify that they be uncorrelated.
 Correlations between two or more exogenous variables are represented by a curved double-headed arrow between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are endogenous variables?

A

o Endogenous variables
 These variables are considered DVs in the model  Will have a directional arrow coming in, and may also have one or more directional arrows moving away if it is a mediator variable.
 Downstream variables caused by the exogenous variables
 Each endogenous variable will also typically have an error or disturbance term associated with it.
 This reflects that there are also unmeasured and unspecified causal effects on these variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are latent variables?

A

o Latent variables
 Each endogenous variable will also typically have an error or disturbance term associated with it.
 This reflects that there are also unmeasured and unspecified causal effects on these variables
 These disturbance terms are usually modeled as latent variables, hence they are represented by circles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is model identification?

A
  • In SEM a model is specified, then parameters (variances and covariances of IVs and regression coefficients) for the model are estimated using sample data, and the parameters are used to produce the estimated population covariance matrix.
  • However, in order to be estimated, a path model must be ‘identified’.
  • This means there needs to be sufficient unique pieces of information (i.e. correlations in the observed data) to allow mathematical estimation of the model that has been specified.
  • A model is said to be identified if it possible to estimate each of the unknown parameters i.e. there must be more known than unkown parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you check model identification for observed variable path models?

A

o Calculate all possible pathways between variables (this is number of ‘data points’ in the SEM, since data points in SEM are the number of non-redundant samples variances and co-variances).
o Simple formula: (v * v+1)/2 where v = no. of variables
o Count all of the model pathways
o Compare these two numbers
o This number that you get is actually the df of the model
o Basic rule: Maximum number of possible pathways between observed variables must equal to or exceed the number of paths specified (drawn/included) in the model. This is the same as saying that df must be more than or equal to 0.
When explaining, always state what just identified, over identified and under identified mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is model estimation?

A

1) Once you have specified and constructed your model, you are ready to estimate your model
2) We are primarily interested in two facets of the model:
1) The direct and indirect effects between variables
2) Global model fit
(The analogue to OLS regression would be interest in: 1) Regression coefficients for individual predictors 2) Test of overall regression model fit i.e. ANOVA for R2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the direct effects in a path model?

A

o Can test the significance of (unstandardised) direct effects
o Should consider the magnitude of direct effects, not just significance- Use past research as a guide
o Use rules of thumb e.g. Cohen: .10 small, 0.30 medium, 0.50 large.
o The path regression coefficients (are the standardised coefficients) reflect direct relations between one variable and another (controlling for the effect of any other variable also effecting the endogenous variable).
o These are the same as beta weights in normal MR We can obtain these by simply running separate OLS regression models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the indirect effects in a path model?

A

o Effects of one variable on another variable via a mediator variable.
o In a standard one-mediator mediated regression there is one indirect effect – the effect of the IV on the DV via the mediator.
o The strength of an indirect effect is obtained by multiplying the constituent direct paths (or the two direct paths that make up the indirect path) i.e. N–> void = .34, avoid –> dep is .5. So indirect effect of n–> dep via avoid is .34*.5 = .17

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can we fully interpret indirect effects in a path model?

A

 Imagine an indirect effect of Neuroticism on Depression via avoid is (0.34 * 0.50) = 0.17.
 What this means is that N has a 0.34 direct effect on Avoid, but only 0.5 of this is transmitted to Dep via Avoid i.e. 0.17.
 This means we can expect an increase in Dep of 0.17 SD units for every 1 SD unit increase in N, via the effects on Avoid.
We would also mention how much the direct effect shrinks when indirect is taken is taken into account, is it a full or partial mediation?
 Can be difficult to calculate statistical tests with two or more mediators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the sobel test

A

The Sobel test is often used to test the significance of the indirect effect with one mediator
 a z test on the ratio of the unstandardised indrect effect to its standard error, only useful with fairly large samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do we calculate total effects in a path model?

A

o Total effects represent the total causal effect of one variable on another.
o This is calculated by summing all of the direct and indirect effects
o In our earlier example: the total effect of N on Dep
 = direct effect + indirect effect
 = 0.22 + 0.17 = 0.39
o Total effect of Avoid on Dep is simply -0.03, as there are no indirect effects in this pathway.
(If you have AMOS output, you also get a total effects table here)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Where are the unstandardised regression weight in AMOS output?

A

o Regression weights table ‘estimate’ column give you unstandardized regression weights, corresponding to ‘B’ column in spss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Where are the standardised regression weights in AMOS output?

A

o Standardised regression weights table gives you the values from beta column that go on the diagram. You also get a different table called standardised direct effects, which gives you the same numbers.
p values not given, they are in the regression weights table with the unstandardised weights.

24
Q

Indirect effects in more complex models- tracing rules

A

o There are several ways N can affect Dep indirectly in this model – we need to trace these through the model
o Tracing rules:
 You cannot enter and exit a variable on an arrowhead (I think this means, for example that when you go: “navoidcog inflexdep”, you can’t then go “trait anxcog inflexavoiddep” because you would be reversing the way that you analyse the relationship between avoid and cog inflex. i.e. in the first one it’s ‘avoidcog inflex’ and in the second one it’s ‘cog inflexavoid.’

25
Q

Total effects in more complex models

A

1 direct effect = .30
2 indirect effects
 N – Avoid – Dep: 0.34 * -0.03 = -0.01
 N – Avoid – Cog Inflex – Dep: 0.34 * 0.15 * .45 = 0.02

o The total causal effect for N on Dep is the sum of the direct and indirect effects
 Direct effect = 0.30
 Sum of indirect effects = (-0.01 + 0.02) = 0.01
 Total effect = 0.31

26
Q

What is the best support for a causal path/SEM model?

A

4 empirical conditions must be met to support causal inference (e.g. Kline, 2005/11)
o Relationship: X should be correlated with Y
o Temporal precedence: X must precede Y in time
o Non-spuriousness: X-Y relationship should hold after controlling for other variables experimentally or statistically e.g. third variable issue
o Correct effect priority: there are no reciprocal relationships between X and Y, or reversals of this relationship

 The more well-specified a model is in terms of theory, logic etc, the more persuasive a case is made for a real model if model turns out to ‘fit’ data if direction of relationship specified a priori this strengthens the plausibility of the model. Can also specify path weights a priori.

Omitting paths from model can also help with plausibility- see parsimony

27
Q

What is a parsimonious model? Why is it good?

A

•df is the difference between full and reduced model
•Reduced model is more simple and parsimonious
• Parsimonious models (if plausible) have several advantages:
o (i) Simplest (but sufficient) models preferred in science
 Occam’s razor - ‘all other things being equal, the simplest model is the most preferred’
o (ii) Easy for a reduced model to be a statistically worse fit than full model - if survives this test of fit then more credibility as plausible model

28
Q

What is meant by model fit?

A

o Bad model fit – model doesn’t explain data as well as other models might (e.g. a model with paths dropped/added) - refine or discard model
o Good model fit – fails to disconfirm your model, the data is well explained by the paths specified in the model.
But remember, you may have good model. But ‘fit’ is with reference to variables in your model
(i) an alternative model with different specification of paths might be even better – still worth testing alternative models
(ii) maybe there is a more complete model (more variables)

29
Q

How is model fit typically assessed?

A

o Remember that correlation = direct+indirect+unanalysed effects; i.e. summing all effects will give original sample correlation
o Sample correlation - If all possible paths (i.e. effects) are included in model (saturated model) they will sum to original sample correlation
o Implied correlation - if only some paths estimated (reduced model), sum of effects will not automatically equal sample correlation – but give a predicted or implied correlation
o Most measures of model fit are based on the discrepancy between sample and implied correlations (residual correlations)
• If correlations from saturated model not that different from reduced model than you have a ‘good’ model

30
Q

What are reduced/saturated models?

A

• Saturated model
o all paths are estimated in model- 0df
o the sample correlation matrix can therefore be reproduced perfectly (by adding up all effects)
• Reduced model (called ‘default’ in AMOS)
o not all paths are included (e.g. earlier example)
o implied or predicted correlations therefore usually different from sample correlations

31
Q

4 common measures of model fit

A

1) Chi-square test
2) SRMR (Standardised Root Mean Square Residual)
3) RMSEA (Root Mean Square Error of Approximation)
4) GFI (Goodness of Fit Index)

32
Q

How to interpret chi-square model fit test

A
  • THE χ 2 VALUE IS IN A COLUMN CALLED ‘CMIN’ IN AMOS. YOU WANT TO LOOK AT THE ROW FOR THE DEFAULT MODEL, THIS IS THE REDUCED MODEL. NON-SIGNIFICANT IS GOOD MODEL FIT
  • χ 2 M (model chi-square) =0 for a saturated model with 0 df (all paths have been estimated)  analogous to ‘error variance’
  • χ 2 M (‘error’) increases when more paths are omitted
  • If χ2 M is significant, reduced model is significantly worse than a saturated model – i.e. is a ‘bad fit’ to data
  • Look for non-significance – indicates good fit (i.e. not significantly different from saturated model despite fewer paths)
  • Problem: with large samples, your model likely to be significantly worse even when differences in fit are substantively small
33
Q

What is the SRMR?

A

A model fit index
Standardised Root Mean Square Residual
• a residual correlation is the difference between a sample correlation and the implied correlation
• the SRMR is based on the average absolute value of the residual correlations
• an SRMR of zero would equal perfect fit (no residual, reduced model implied correlation is very similar to the full model sample correlation, so reduced model is not worse than full model, so we opt for the reduced model as it is more parsimonious)
SRMR < . 1 indicates good fit

34
Q

What is RMSEA?

A
A model fit index
Root Mean Square Error of Approximation
•	popular fit measure designed to asses the approximate fit of a model rewarding parsimony
•	of two models with similar explanatory power, the simpler model -- fewer paths (df) -- will be favoured
•	Browne and Cudeck (1993) suggested:
o	RMSEA < .05 – good fit
o	RMSEA < .08 – reasonable fit
o	RMSEAs above .10 poor fit
35
Q

What is the GFI?

A

A model fit index
goodness of fit index
• different approach to model fit
• compares researcher’s model with the independence model
o independence model predicts all variables are independent (i.e. zero correlations)
o analagous to R2 – estimates total variance accounted for by our model
• Hu & Bentler (1999) guidelines:
o GFI >.95 = good fit
o GFI >.90 = adequate fit

36
Q

What is a full SEM?

A

• A full SEM is simply a combination of a measurement model (i.e. a CFA model) and a structural model (i.e. a path model, observed variables)
• Full SEM extends observed variable path analysis (structural model) by creating a latent variable measurement model, and then examining relationships between these latent variable factors
• This essentially involves a two-step process (nb. some approaches break these steps down further):
o Specify and estimate a candidate measurement model (aka Confirmatory Factor Analysis)
o Once you have a viable measurement model (CFA model), you re-specify the model as a structural model and examine the relationships between the latent factors

37
Q

Differences between CFA and EFA

A
  • The typical difference between the two is that in CFA we constrain factor loadings (usually to be 0) ie we do not allow all observed items/indicators to load freely on all of the factors
  • We can also specify the number of factors we want to extract
  • So the CFA model is a more constrained version of the EFA model
38
Q

What are factors in CFA

A

• Factors
o are latent (or unmeasured) variables e.g extraversion, impulsivity, verbal IQ etc
o represented by circles in path notation
o Factors are typically assumed to cause variation in the indicators, so the single-headed directional arrow moves from the factor to the indicator e.g. general verbal IQ causes one to think of more words on a specific vocabulary test.

39
Q

What are indicators in CFA

A

•  Indicators
o are measured or indicator variables
o are represented by squares (like any observed variable)
o represent the actual items or measures directly assessed

40
Q

What are factor loadings in CFA

A

• Factor Loadings
o estimate the relationship between the factor and the observed indicator
o can be thought of as the correlation between the factor and the indicator in standard CFA models
o we would typically like these to be >.50

41
Q

What are factor covariances in CFA

A

• Factor Covariances
o estimate the relationship between the latent factors
o we can use this information to examine the convergent and discriminant validity of the factors

42
Q

What are error terms in CFA?

A

• Error terms
o these model variation in the indicator variable not accounted for by the factor e.g. anything that accounts for word vocabulary excluding verbal IQ
o these error terms are usually uncorrelated with each other, but you could model error correlations if you expected that response across indicators would be caused by something other than the factors e.g. method effects (?)

43
Q

8 steps in CFA

A
  1. refer to theory/previous research to ascertain appropriate model
  2. Specify the model
  3. Model identification
  4. Model estimation
  5. Testing model fit
  6. Interpret model effects
  7. Modifying models
  8. Reporting results
44
Q

2 important steps in specifying CFA model

A

• Error terms
o cannot know the variance of unmeasured variables
o fix the error variances to 1 in model specification
o or, fix raw error loadings to 1 (AMOS default) – sets error variance based on indicator variance
o important for model identification
• Factor variance
o factors unmeasured so variance also unknown
o fix factor variance to 1 or set raw factor loading to 1
o only need one factor loading to be set to 1 per factor
o again, important for identification of model

45
Q

Calculation for model identification in CFA/measurement model

A

• CFA uses known values in the variance/covariance matrix to estimate unknown values e.g. factor loadings
• To find if model is identified:
o knowns: calculate number of observed covariances and variances e.g. v * (v + 1)/2, where v equals number of variables
o unknowns: count up number of paths in the model (excluding the 1 factor loading for each factor that has been set to 1) and the error variances (count only the circles, their arrows have been set to 1 so you don’t need to count their small arrows)
o calculate knowns – unknowns for model df
o If model df greater than or equal to 0 then proceed
o If not, need to re-specify model

46
Q

2 simple heuristics for CFA model identification

A

• Simple heuristics for standard CFA models e.g. models with uncorrelated error terms and where each indicator loads on just one factor
o (A) If a model with a single factor has 3 or more indicators it will be identified
o (B) If a model with 2 or more factors has 2 or more indicators per factor it will be identified

47
Q

Interpreting factor loadings and factor correlations

A

o factor correlations > .75-.80 suggest that the model is ‘overfactored’ – a more plausible model might involve collapsing the factors in to one and re-estimating model
o this is where you also need to rely on what theory and previous research suggest
• factor loadings >.5 are good
o this suggests each indicator is doing a good job of representing the factor
o if factor loadings are low, it may suggest you should remove this indicator from your measurement in the future

48
Q

What is model building?

A

Way of refining CFA model
 Starts with a bare-bones model then adds path(s)
 If extra paths significantly improve fit these are added to model

49
Q

What is model trimming?

A

Way of refining CFA model
o Model trimming
 Typically starts with a saturated model (usually but not necessarily) and simplifies it by eliminating paths
 If the model fit does not significantly deteriorate then paths can be removed (model is no worse but is simpler- parsimonious models more favourable)

50
Q

How to determine if model is improved by building using M.I.

A
  • Modification indicies (MI) can be used to add individual paths to the model
  • The larger the MI the greater the improvement in model fit.
  • Usual convention is MI >4 suggest an improvement in model fit and path should be added.
  • In the case below, neither of the added paths improve the model fit.
51
Q

How to determine if model is improved by building using chi-square

A

• A chi-square difference test can be conducted using chi-square values and degrees of freedom from any two ‘nested’ models.
• A nested model is a model that uses the same variables (and cases!) as another model but specifies at least one additional parameter (path?) to be estimated.
A. Calculate the X² (chi-square) for the first model (X² m1)
B. Calculate the X² for the second model with paths added (X² m2)
C. Calculate the difference X² D (i.e. X² m2-X² m1)
• If X² D is significant, then the model is significantly improved by adding paths and these can be retained in your refined model (see below for explanation of X² D significance)
• the X² for the 2 factor model was 3.25 (4df).
• the X² for the 1 factor model was 51.08 (5df)
• so, to test the difference in fit we calculate the difference in X² between the models (X² D = 47.83) and evaluate this against the chi-square critical value for 1 df (difference between 2 factor and 1 factor model df, i.e. one path removed). If x2 d exceeds the critical value, then the test is significant.

52
Q

How to determine if model is improved by trimming using chi-square

A

Calculate the X² (chi-square) for the first model (X² m1)
B. Calculate the X² for the second model with paths added (X² m2)
C. Calculate the difference X² D (i.e. X² m2-X² m1)
• If X² D is significant, then the model is significantly improved with more paths. If it is nonsignificant, the model is not significantly worse with fewer paths, so we accept the trimmed/reduced model for parsimony.

53
Q

Theoretical vs empirical model re-specification

A

• Theoretical approach
o model trimming/building guided by theoretical a priori considerations
• Empirical approach
o Paths are added or deleted from model purely on basis of statistical criteria
o In model building, MIs for all paths are examined to see which ones significantly improve model
o can capitalise on chance correlations
o this type of SEM is more exploratory (cannot claim you are ‘confirming’ theory)
o credibility of model improved if model structure replicated in another sample

54
Q

What is multiple group SEM?

A
  • Test an SEM across a categorical variable e.g. gender, cultural group etc
  • We might want to look at model estimates in different groups, or see whether a particular model holds across group ie it is invariant across group
  • This can be done for a CFA model or a full SEM
  • Uses the principle of iteratively constraining parameters in the model to equality across the groups (implying they are the same in each group), and then looking to see if this produces a significant decrement in model fit
  • If a significant decrease in model fit occurs, you then have to identify which parameter/s have caused this problem i.e. you can iteratively free parameters to identify the source of the misfit
55
Q

Assumptions of SEM

A

• The assumptions largely follow from those for correlation/regression analyses (see the appropriate lecture).
• Linearity- dependent (endogenous) variables should be linearly related to independent variables
• SEM programmes can handle continuous and categorical variables, but check for coding of categorical variables and make sure programme knows what codes are being used
• Normality- residuals should be normally distributed and homoscedastic
• Identification- models cannot be under-identified
• Adequate sample size
o Kline recommends at least 10 times as many cases as parameters (paths) – ideally 20 times
• Proper Model Specification
o specification error occurs when common causal variables are left out of the model
• Disturbances uncorrelated with endogenous variables (same as MR – errors uncorrelated with independent variables)
• No multicollinearity
• Exogenous variables are reliably measured