SEM Flashcards
What is SEM a combination of ?
Confirmatory factor analysis (the measurement model)
Path analysis (the structural model)
What is path analysis?
An extension of multiple regression
what is the aim of path analysis?
Its aim is to provide estimates of the magnitude and significance of hypothesised causal connections between sets of variables.
in path analysis, we are interested in….
Interested in the size and direction of the direct and indirect effects between multiple variables.
so Simple path models are essentially
mediated regression.
Mediation implies a….
causal chain…. as there is a series of relationships…..
but path analysis
Path analysis is making a causal claim – but it is separate from the statistical analysis - more theory driven
When specifying a model, we must…
Ð Use theory and previous research and logical relations between variables to justify path model
Ð Then draw model using path diagram
Name the components of the path diagram
– Observed variables = Squares
– Unobserved (latent) variables = Oval / Circle
– Single headed arrows = causal relationships (direct paths)
– Double-headed arrows = correlation
– ERROR TERMS / Distrubance
– endogenous
– exogneous
what are endogenous variables?
Ð Considered the DVs
Ð Directional arrow inputs
Ð Can have arrow output turning DV into a mediator
Ð ‘Downstream’ variables caused by exogenous variables.
Ð extra point measuring or accounting for the other possible causal inputs not specified in the model (always going to unspecified and unaccountable variables that we haven’t measured taking effect – we model/associate these with an ERROR or DISTURBANCE term)
Ð Error terms or disturbance terms are represented by ovals as latent variable
WHAT ARE exogneous variables?
Ð They are IVs in the model
Ð No specified arrow input. We don’t specify that in the model.
Ð You can have multiple exogenous variables (can be correlated)
In order for analysis to be run, the model needs…
to be identified
what is model identification?
Ð REQUIRES sufficient UNIQUE pieces of information (i.e. correlations in the observed data) – this allows mathematical estimation of the model
Ð Is tricky with more complex model, but a rule of thumb below
what is the rule of thumb?
Maximum number of single connections between observed variables must equal or exceed the number of paths specified in the model
how to calculate model identification ?
Calculate using (v x v+1 /2) where v = knowns- variances
then compare to unknowns - variances
this includes:
FOR CFA / SEM errors factors factor loadings (not including the 1 denoted for the fixed error term) covariates between factors
what are the three types of identified models?
Ð Over-identified model = More correlations than free paths in the model
Ð Just-identified model (saturated model) = Correlations equal to number of free paths in the model
Ð Under-identified model = Fewer correlations than free paths – model cannot be estimated
what can you also calculate with the vxv+1/2 formula?
degrees of freedom
Recursive models are…. and are always….. in comparison to reciprocal
Recursive models = those with connections moving in the same direction (always identified)
Reciprocal = More complex, identification more complex – not common in psychology
when estimating the model, there are three types of effects?
Ð Direct effects and indirect effects
Ð Global model fit
what are direct and indirect effect analogous to?
(analogous to regression coefficients for ind predictors)
what are global model effects analogous to?
(analogous to ANOVA for R2)
formal definition of direct effect?
The path regression coefficients reflect DIRECT relations between one variable and another (controlling for the effect of any other variable also effecting the endogenous variables
Ð Same as Beta weights in MR
formal definition of indirect effect?
The effects of one variable on another variable via a mediator
to calculate indirect effects we….
simply multiply the standardised beta weights together
Ð Difficult with two or more mediators
Ð Again Sobel test (z test on the ratio of unstandardised indirect effect to its standard error – needs large samples) and bootstrapping (McKinnon)
total effect are…
Ð These represent the total causal DIRECT AND INDIRECT effects on one variable to another
Calculated by add/sum all direct and indirect effects
what are the “Tracing rules”?
Ð You cannot enter and exit a variable on an arrowhead
Ð You cannot enter a variable twice on the same trace
even if you find good model fit, it is important to remember ….
It is important to remember that just because you find a good model fit, doesn’t exclude the possibility another model will explain the data better
if a path weight is set to 0, what does it mean?
Paths omitted
are as important to model as paths included. Their absence is making a theoretical statement (even if not not explicitly expressed); e.g. ‘I hypothesise there are no direct effects of ethnicity and family background on grades’
why would we omit paths?
for model parsimony
Model is therefore simpler or more parsimonious than a full model with all possible paths required to be estimated
Parsimonious models (if plausible) have several advantages: Ð Simplest (but sufficient) models preferred in science Occam's razor - ‘all other things being equal, the simplest model is the most preferred’ Ð Easy for a reduced model to be a statistically worse fit than full model - if survives this test of fit then more credibility as plausible model
Explaining more with less – A saturated model would explain everything – but if just 2 variables used explaining 85% of the model would be a great parsimonious model.
take home message ….
Constraining the model in various ways as a way of testing theory or particular set of research questions: we want to rigorously test our data the best we can by having the most parsimoniously model.
recap… Ð Error terms reflect the
unmeasured
Be wary of models that are
close to saturation
what is a basic summery of Quantifying model fit ?
basic notion is difference between observed correlations the saturated full sample correlation and the implied correlation (reduced model) is the RESIDUAL and we want this to be as small as possible
what are the four ways of quantifying model fit in both path analysis / SEM / confimatory factor analysis ?
come in sr. mr. Ramseay: go fuck it
1) Chi-square test (as a minimum – CMIN in AMOS)
2) Standardised Root Mean Square residual (SRMR)
3) Root mean square error of approximation (RMSEA)
4) Goodness of fit index
a chi 2 of 0 would mean
Ð If a x2 of 0 it will be the saturated model
as the number of paths are reduced, chi 2 …. and why?
goes up, fewer degrees of freedom - and error increases
what do we want the chi 2 value to be?
high
Significance of x2 is a measure of bad fit.
So we are looking for non-significance as we are looking to show the results are not significantly different from the saturated model despite having fewer paths.
what is Standardised Root Mean Square residual (SRMR)
?
Ð a residual correlation is the difference between a sample correlation and the implied correlation
Ð the SRMR is based on the average absolute value of the residual correlations
what value are we looking for?
Ð an SRMR of zero would equal perfect fit (no residual)
Ð SRMR
what is Root Mean Square Error of Approximation (RMSEA)
?
Ð popular fit measure
Ð designed to assess the approximate fit of a model rewarding parsimony of two models with similar explanatory power, the simpler model – fewer paths (df) – will be favoured
what value are we looking for?
Ð Browne and Cudeck (1993) suggested:
• RMSEA < .05 – good fit
- RMSEA < .08 – reasonable fit
- RMSEAs above .10 poor fit
what is the goodness of fit?
Ð Analogous to R2 (estimates total variance accounted for by the model)
what values good?
Ð Values closer to 1 are better fit
Ð Hu & Bentler (1999): >.95 = good >.90=adequate
theoretical issues to remember with model fit…
Purpose of model fitting is to rule out bad models – cannot prove a model is good
Bad model fit – model doesn’t explain data as well as other models might (e.g. a model with paths dropped/added) - refine or discard model
Good model fit – fails to disconfirm your model – you may have good model. But ‘fit’ is with reference to variables in your model. Ur model is not the 1 and only model.
(i) an alternative model with different specification of paths might be even better – still worth testing alternative models (ii) maybe there is a more complete model (more variables) But status of ‘not yet disconfirmed’ is powerful in science
the FIRST step toward building a full SEM is?
Confirmatory Factor Analysis
what does CFA the do?
Ð Helps to confirm a structure and test a theoretically driven model of psychological measures
Ð i.e. Once we have an EFA-derived measure, we can administer it to a new sample, and see if we can confirm the original measurement model.
Ð Provides imp info on how a measurement tool is structured /and /or how latent factors are related to each other.
principal difference between EFA and CFA?
Ð In CFA we CONSTRAIN factor loadings (usually to 0) – i.e. we do NOT allow observed items/indicators to load freely on all of the other factors (cutting off some data from some factors – EFA is saturated as they can load freely on all factors without any constraints – CFA is about CONFIRMING
Ð So the CFA model is more constrained then the EFA model
what shows the strength of the relationships between factors?
Factor loadings: estimate the relationship. Can be thought of as correlation. Need to be >.50
what are Factor Covariances: ?
Factor Covariances: estimates the relationship between latent factors. USED to examine the convergent and discriminant validity of factors
what do error terms represent?
♣ model variation in the indicator variable not accounted for by the factor e.g. anything that accounts for word vocabulary excluding verbal IQ
these error terms are usually uncorrelated with each other, but you could model error correlations if you expected that response across indicators would be caused by something other than the factors e.g. method effects
7 Steps to setting up a CFA model?
1) Specify the model
2) Model identification
3) Model estimation
4) Testing model fit
5) Interpret model effects
6) Modifying models
7) Reporting results
sperm molesting inmates effect modest reporter
when we specify the model, we are generally
(SETTING UP STRUCTURE)
Ð cannot know the variance of unmeasured variables
Ð fix the error variances to 1 in model specification
Ð Factor also unmeasured so again variance unknown
Ð Set to 1 again – but only need one factor loading per factor
Ð IMPORTANT FOR IDENTIFICATION
Ð Software does it for you
how to calculate model identification ?
o knowns: calculate number of observed covariances and variances
e.g. v * (v + 1)/2, where v equals number of variables
o unknowns: count up number of free paths and variances
o calculate knowns – unknowns for model df
o If model df greater than or equal to 0 then proceed
If not, need to re-specify model
model estimation is really 2 things….
Ð Estimate model parameters (factor loadings and covariances)
Ð Test global model fit (and against alternative models)
how do we test model fit?
o we use the same model fit indices from earlier e.g model chi- square, RMSEA etc
o there is no gold standard fit index
o lot of debate about golden rules (and otherwise) for various fit indices
o need to consider and report a range of fit indices
o think about the fit indices in the context of your specific model, rather than blindly apply rules of thumb
what is the Standardised Root Mean Square residual (SRMR)?
Ð a residual correlation is the difference between a sample correlation and the implied correlation
Ð the SRMR is based on the average absolute value of the residual correlations
Ð an SRMR of zero would equal perfect fit (no residual)
Ð SRMR
what is chi 2?
Ð If a x2 of 0 it will be the saturated model
Ð Error increases as paths are reduced and x2 goes up
Ð Significance of x2 is a measure of bad fit. So we are looking for non-significance as we are looking to show the results are not significantly different from the saturated model despite having fewer paths.
what is goodness of fit?
Ð Analogous to R2 (estimates total variance accounted for by the model)
Ð Values closer to 1 are better fit
Ð Hu & Bentler (1999): >.95 = good >.90=adequate
what is Root Mean Square Error of Approximation (RMSEA)?
Ð popular fit measure
Ð designed to assess the approximate fit of a model rewarding parsimony of two models with similar explanatory power, the simpler model – fewer paths (df) – will be favoured
Ð Browne and Cudeck (1993) suggested:
• RMSEA < .05 – good fit
- RMSEA < .08 – reasonable fit
- RMSEAs above .10 poor fit
after testing all this - what can we do?
We can test different factor models against each other – the hunt for parsimony
and the Model can be refined by
building or trimming - i.e. adding or deleting paths to or from original model
model building is….
Ð Starts with a bare-bones model then adds path(s)
Ð If extra paths significantly improve fit these are added to model
model trimming is….
Ð Typically starts with a saturated model and simplifies it by eliminating paths
Ð If the model fit does not significantly deteriorate then paths can be removed (model is no worse but is simpler)
to achieve SEM, once we have a viable CFA measurement model, you
re-specify the model as a path model
This is reflected by the fact that WHAT COULD HAVE BEEN
DOUBLE HEADED ARROWS are specified as a PATH. Turning a measurement model (CFA) turns into full SEM by changing double headed into direct single headed arrow paths.
So taken the principals of path analysis and CFA – and combined them both together to get
structural equation model SEM.
You can still test for alternative models using the same methods as provided earlier, and Deletion/adding of paths can be
Deletion/adding of paths can be theoretically or empirically driven
Theoretical approach INCLUDES
Ð model trimming/building guided by theoretical a priori considerations
e.g. ‘ I hypothesise that ethnicity & family background have no direct effect on grades (effects are likely to be indirect ones) and therefore adding them as direct paths will not result in a significantly improved model’
Empirical approach includes?
Ð Paths are added or deleted from model purely on basis of
statistical criteria
Ð In model building, Modification Indices (MI) – another route (improvement in chi 2 value)
for all paths are examined to see which ones significantly improve model
Ð can capitalise on chance correlations
Ð this type of SEM is more exploratory (cannot claim you are
‘confirming’ theory)
Ð credibility of model improved if model structure replicated in another sample
name some extensions to SEM
Categorical variable = Multiple – group SEM (testing SEM across categorical variable like gender)
Hierarchical data = Multi-level SEM for data with hierarchical structure
Repeated measures = latent growth modelling
Categorical Latent variables = Mixture modelling
name assumtpions
The assumptions largely follow from those for correlation/regression analyses (see the appropriate lecture).
Linearity dependent (endogenous) variables should be linearly related to independent variables SEM programmes can handle continuous and categorical variables, but check for coding of categorical variables and make sure programme knows what codes are being used
Normality residuals should be normally distributed and homoscedastic
Identification
models cannot be under-identified
Adequate sample size
Kline recommends at least 10 times as many cases as parameters (paths) – ideally 20 times
5 times as many cases is often insufficient
Proper Model Specification specification error occurs when common causal variables are left out of the model
Disturbances uncorrelated with endogenous variables same as MR – errors uncorrelated with independent variables
No multicollinearity
Exogenous variables are reliably measured
sperm molesting inmates effect modest reporter
7 Steps to setting up a CFA model?
1) Specify the model
2) Model identification
3) Model estimation
4) Testing model fit
5) Interpret model effects
6) Modifying models
7) Reporting results
what is the Comparative Fit Index in SEM?
The comparative fit index (CFI) analyzes the model fit
how does CFI work?
by examining the discrepancy between the data and the hypothesized model
what does CFI also account and adjust for?
for the issues of sample size inherent in the chi-squared test of model fit,[20] and the normed fit index
what is the mnemonic for CFI/GFI
god fucks in 1 huge bentley
what does god fucks in 1 huge bentley stand for?
CFI values range from 0 to 1, with larger values indicating better fit. Previously, a CFI value of .90 or larger was considered to indicate acceptable model fit.[31] However, recent studies have indicated that a value greater than .90 is needed to ensure that misspecified models are not deemed acceptable (Hu & Bentler, 1999).
RMSEA mnemonic ?
run d-m-c cued the brown note on the decks in 1993
what does run d-m-c cued the brown note on the decks in 1993 – mean?
Browne and Cudeck (1993) suggested RMSEA fit:
• RMSEA < .05 – good fit
- RMSEA < .08 – reasonable fit
- RMSEAs above .10 poor fit
What 3 advantages are there to estimating this model as an SEM model as opposed to running separate regressions on scale totals?
1) You get an overall test of model fit that can disconfirm whether your model fits the data.
2) You also get indices of approximate fit. Parameter estimates are better estimated in one go if possible than estimating in multiple steps as bias is introduced in unnecessary multiple step estimation.
3) it is possible to estimate the impact of the unreliability of the composite measures and their impact on the regression coefficients, this is a key advantage of SEM.
what is a composite variable ?
However, we’re still fascinated by the idea of bundling different variables together into a single causal effect, and maybe evaluating the relative contribution of each of those variables within a model.
In SEM, this is known as the creation of a Composite Variable. This composite is still an unmeasured quantity – like a latent variable – but with no error variance, and with “indicators” actually driving the variable, rather than having the unmeasured variable causing the expression of its indicators.
What more information do you require (past the overall model - trick was to search the info given) to assess the adequacy (or otherwise) of the model?
Sample size, Standard errors of estimates, whether estimates are standardized or not.
Describe what we mean by ‘exogenous’ and ‘endogenous’ variables in path models.
Exogenous variables are specified to have no causal predictor in the model, although they can co-vary with other exogenous variables. Endogenous variables are predicted by exogenous variables and other endogenous variables included in the model, as well as unspecified variables (via an error term).
when calculating the effect total effect of one variable on another in a path, one must calculate…..
1) indirect pathways
2) but also a mediators relationship to another varible which also leads to the DV = three-way multiplication of the constituent paths
3) Add up all the pathways