Confirmatory Factor Analysis Flashcards
Describe how the confirmatory factor model is different to the exploratory factor model
The exploratory factor model:
πππ =ππ +π1π π1π +π2π π2π +β―πππ πππ +πππ
It is characteristic that all subtests/items load on all factors since we donβt know which factors should load on which items. You hope that some items load clearly on some factors but the model itself doesnβt pose any structure on the model.
The confirmatory factor model:
πππ =ππ +π1π π1π +π2π π2π +β―πππ πππ +πππ
with certain factor loadings (eg π1) fixed to 0 according to theory/expectation
The intercept is commonly omitted in both. In the confirmatory factor analysis, you explicitly state that items x load on factor 1 and items y load on factor 2 etc. This difference is visualised in docs.
How does confirmatory factor analysis differ from exploratory factor analysis in terms of notation?
The confirmatory factor model can be wrote in the same notation as an exploratory factor model:
πππ =ππ +π1ππ1π +π2ππ2π +β―ππππππ +πππ
However matrix notation is often used, as in the book. This makes it a little more compact as you donβt have to specify for item i etc as this information is contained in the matrices:
ππ =π+π²π πΌπ +ππ or ππ =π²π πΌπ +ππ (if fit on covariance matrix) with ππ = data matrix π = intercepts π²π = Factor loadings πΌπ = Factor scores ππ = Error residuals
See docs for how this looks in terms of matrices
Why, again, do you mostly omit the intercept in practice?
The intercept does not contain any information about the factor structure, it is only interesting if you want to apply a factor model to continuous data in the sense of fitting an IRT model (then it is an item attractiveness parameter).
There is an image of the factor loading matrix for the confirmatory factor model in docs. Explain what information it gives/ how the information is structured.
The rows (n) correspond to the items/ variables and the columns correspond to the factors (m). E.g π²π[2,3] is the factor loading of the second item of the third factor.
Describe how the data matrix, factor loading matrix and factor scores would look like in a confirmatory factor model in which the first three items are proposed to load on the first factor and the next three load on the second factor
ππ = π²π² = πΌπ = π¦π1. π11 0 π1π π¦π2 π21 0 π2π π¦π3 π31 0 π¦π4 0 π42 π¦π5 0 π52 π¦π6 0 π62
From this model you can derive something which has a different notation, describe this and give the notation
Because a lot of people focus on fitting the model to a covariance matrix, this matrix contains all this information as shown with the following formula:
πΊπ =π²π πΏπ²πβ² +π―π
The original formula contained ππ indicating the raw data, however this doesnβt exist with the covariance matrix. Therefore this formula is derived from the original formula to describe what information is taken from the cov. matrix. rather than raw data
πΊπ = Model predicted covariance matrix πΏ = Factor covariance matrix π²π² = Factor loading matrix π―π = Residual covariance matrix
Describe what πΊπ looks like/ how is is structured
πΊπ is a matrix with the variance of each item π^2 π¦1β¦6 down the diagonal, and covariance of each item mirrored on the left and right of the diagonal: ππ¦3π¦2 (covariance of item 3 and 2).
Describe the structure of πΏ
Factor covariance matrix
πΏ=
π^2π1 ππ1π2
ππ1π2 π^2π2
var(factor 1) cov(F1, F2)
cov(F1, F2) var(factor 2)
Describe the structure of π―π
the residual covariance matrix has the variance of the errors of each item down the diagonal of the matrix and 0s elsewhere similar to an identity matrix. There shouldnβt be covariance between the residuals as is an assumption in IRT
Describe the structure of πΊπ
Down the diagonal you have the variances of each item: π^2 11 π^2π1 +ππ1: π^2 21 π^2π1 +ππ2 π^2 31 π^2π1 +ππ3 π^2 42 π^2π2 +ππ4 π^2 52 π^2π2 +ππ5 π^2 62 π^2π2 +ππ6
Where the first is the loading of item 1 (on factor 1) squared times the factor variance plus the residual variance of item one
Beside the diagonal are the model implied covariances where
π11π21 π^2π1 gives the covariance of the factor loading of item one and item 2 times the variance of factor 1 since they load on the same factor
π21π62 ππ1π2 gives the covariance of the factor loading of item one and item 6 times the covariance of factor one on factor 2 since they load on different factors.
A better visualisation of this is given in docs
During CFA, what are two things you donβt necessarily want from your analysis/ try to avoid?
Cross loading: Where an item which was meant to load on one factor loads on another factor (shares variability with items from one factor that it does not share the the variables of its own factor)
Residual covariance/ correlation: Where there is covariance between error residuals on items
If you have residual covariance/ correlation, what do you hope for?
That there is some explanation, e.g the two imagined situations both take place in a supermarket so there is shared item specific error.
What changes in your factor loading matrix if you have a cross loading? For example if, in your earlier model of 6 variables loading on 2 factors, the fourth variable is loading on the first factor
The factor loading matrix goes from: π²π² = π11 0 π21 0 π31 0 0 π42 0 π52 0 π62
To: π²π² = π11 0 π21 0 π31 0 π41 π42 0 π52 0 π62
Since item four now also loads on factor 1
What happens your πΊπ if crossloadings are introduced
For each point describing
the covariance of the fourth item (loading on the second factor) with another item * the covariance of the factors/ variance of the second factor,
you also have to add
the covariance of the fourth (loading on the first factor) item with another item * the covariance of the factors/ variance of the first factor,
For the variance of the fourth item loading on the second factor you have to add the variance of the fourth item loading on the first factor and 2*(the covariance of the two items loading on the two factors * the covariance of the two factors)
This is better visualised in docs
How do residual covariances affect your CFA matrices? E.g for and error correlation between item 2 and 5
In the residual covariance matrix, the error matrix looks the same however in place of the zeroes at the intersection between 2 and 5 there is the error correlation, ππ2π5
In the predicted covariance matrix, it just adds an error covariance to the covariance between item 2 and 5
What two identification issues exist within CFA?
- Scaling the latent variable
2. Statistical identification
What does scaling the latent variable consist of in CFA?
Similar as in the one factor model except you have multiple factors now, the scale of the latent variables (factors) are identified by
fixing the mean of each factor to 0 and:
β’ Option 1:
β’ Fix one factor loading to 1 for each factor
β’ Option 2:
β’ Fix the factor variance to 1 for each factor
This really doesnβt make a difference to the conclusions drawn
What do the first option (Fix one factor loading to 1 for each factor) for scaling the latent variable in CFA mean for the matrices?
Has implications for the factor loadings matrix: The factor loading matrix goes from: π²π² = π11 0 π21 0 π31 0 0 π42 0 π52 0 π62
To: π²π² = 1 0 π21 0 π31 0 π41 1 0 π52 0 π62
In lavaan it automatically takes the first factor
What do the second option (Fix the factor variance to 1 for each factor) for scaling the latent variable in CFA mean for the matrices?
The factor covariance matrix goes from:
πΏ=
π^2π1 ππ1π2
ππ1π2 π^2π2
var(factor 1) cov(F1, F2)
cov(F1, F2) var(factor 2)
to:
πΏ=
1 ππ1π2
ππ1π2 1
What is an advantage of changing the factor variances to 1?
πΏ=
1 ππ1π2
ππ1π2 1
This is nice because it makes the variance between the factors a correlation, and in the output you can immediately see the correlation between the factors
What is involved in the statistical identification of CFA?
β’ The number of parameters should not exceed the number of observed (co)variances
- In CFA this can happen if you have too few observed variables for your model
What is a good metric to test if this statistical identification is satisfied?
A model should always have degrees of freedom larger than or equal to 0 (ππ =0 is possible, but limited usefulness)
How do you calculate the degrees of freedom for CFA?
Same as EFA:
ππ = πβ π
π: number of independent pieces of observed information
π: number of parameters
If CFA conducted on covariance matrix: β’ π = π β (π + 1)/2 π: number of observed variables β’ E.g.,: π=4 βπ = 4 β 5/2 =10 β’ E.g.,: π=7 βπ = 7 β 8/2 =28
Different to EFA: For k (number of parameters) there is no straightforward as in EFA, you have to think about how many parameters are in the formula. For this you really have to understand the model (see next card for example)
Calculate m and k for a model with two factors in which items 1:4 load on the first factor and items 4:6 load on the second factor. The first factor loading for each factor is set to 1 for identification
Independent pieces of information:
π =6 β7/2 =21
Number of parameters (there is no formula!):
π =6 (residual variances) + 5 (loadings) + 2 (factor variance) + 1 (factor covariance) = 14
To make sense of this, think of the matrices used for this model: ππ = π²π² = πΌπ = π¦π1 1 0 π1π π¦π2 π21 0 π2π π¦π3 π31 0 π¦π4 π42 1 π¦π5 0 π52 π¦π6 0 π62
Same amount of residual variances (errors) as items = 6
7 factor loadings - 2 fixed parameters = 5 free parameters
Factor covariance matrix
πΏ=
π^2π1 ππ1π2
ππ1π2 π^2π2
Two factor variances along diagonal
1 factor covariance between factors (matrix is mirrored)
How can residual variances affect statistical identification?
Each correlation/ covariance between errors is an additional parameter
E.g model with three parameters, 1 factor and no residual variance:
Just identified!
β’ M = 3*4/2=6, k=6, df = 0
model with three parameters, 1 factor and residual variance between 2 items:
Not identified
β’ M = 3*4/2=6, k=7, df = 1
In CFA, itβs all about model fit according to the strange man talking on my laptop. Why does he make this claim?
Because youβre interested in tested a hypothesised factor structure so hypothesis testing is the centralised theme
What indices do you use to analyse model fit?
Theres a whole bunch of them since model fit is the centralised theme, however the one most used and reported in the chi-sq (π2) goodness of fit indice
How do you calculate π2 goodness of fit? (again)
π2 = β2 β πΉ(ππΏ) with ππ = π β π
where πΉ(ππΏ) is the value of the fit function that is maximized in Maximum Likelihood and also fuck my life depending on the context
If it is significant, your model doesnβt fit
Give four more goodness of fit measures and what they do
- Standardized Root Mean Squared Residual (SRMR)
- Standardized difference between π (observed cov matrix) and Ξ£y (predicted cov matrix)
- Root Mean Sqaure Error of Approximation (RMSEA)
- SRMR with correction for number of parameters
- Comparative Fit Index (CFI)
- Compares model to a baseline model without correlations between variables
- Normed between 0 and 1
- TuckerβLewis Index (TLI)
- Similar to CFI but non-normed (can be larger than 1 or smaller than 0)
Give four comparative fit models, what they do and what they require
β’ Likelihood ratio test (requires nested models):
β’ π2 = β2(ππππΏ(ππππ π‘πππππ‘) βππππΏ(π’πππππ π‘πππππ‘))
with ππ = ππ’πππππ π‘πππππ‘ βπππππ π‘πππππ‘
or equivalently (Since it is also a π2 statistic you can just subtract the π2 from each other) β’ π2 =π2ππππ π‘πππππ‘ βπ2π’πππππ π‘πππππ‘ with ππ =ππππππ π‘πππππ‘ βπππ’πππππ π‘πππππ‘ If significant, your unconstrained model should be preferred
- Akaike Information Criterion (AIC)
- π΄πΌπΆ =π2 +2Γπ
- Compare to competing model, no nesting necessary
- Bayesian Information Criterion (BIC)
- π΅πΌπΆ =π2 +log(π)Γπ
- Compare to competing model, no nesting necessary
- Corrected Akaike Information Criterion (CAIC)
- πΆπ΄πΌπΆ =π2 +[1+log π ]Γπ
- Compare to competing model, no nesting necessary
How do the AIC, BIC and CAIC work and which is the most βstrictβ among them?
The lower value indicates a better model and they punish for the amount of parameters. The BIC and CAIC are more strict than AIC, meaning they punish more for more complex models.
So far weβve talked about absolute model fit (does the model fit, y/n) and comparative model fit (which model fits better?). What type of model fit index is left?
Local model fit: Look at a model (typically not fitting great) and analysing where in the model fit do you need to change something to improve model fit e.g introducing a cross loading or residual covariance to improve model fit
What are the indices used to assess local model fit called? For what parameters are they available for?
Modification indices; Available for all parameters that are fixed
β’ E.g., cross-loadings, residual covariances
What do these modification indices indicate>
Indicate how much the chi-square fit statistic will improve (decrease) if that parameter is freed
What type of statistics are these and what implications does this have? (2)
Strictly, these are Ο(1) statistics
β’ i.e., a value larger than 3.81 is significant
β’ Results in serious chance capitalisation and overfitting
What is the recommend cut off?
Some people recommend cut-off at 10.00 (if a modification is above 10 then you should free that parameter)
β’ But still danger of chance capitalization / overfitting, Therefore, use very carefully
When should you make modifications to your model?
Only free a parameter if its modification index is extremely high as compared to the others
β’ Ideally, there is an explanation for the misfit
β’ E.g., for IQ test: residual covariance between Block Design and Object Assembly
Say you wanted to see if worry and rumination questionnaires were better suited to a one factor or 2 factor model with W1, W2, W3 and W4 measuring worry and R1, R2, R3 and R4 measuring rumination
Write R code which would help with this
model12 = β
Worry = W1 + W2 + W3 + W4
Rummi = R1 + R2 + R3 + R4
β
fit2 <- cfa(model12, sample.cov = RMT_cov, sample.nobs = 3907)
model11 = β
RNT = W1 + W2 + W3 + W4 + R1 + R2 + R3 + R4
β
fit1 = cfa(model11, sample.cov = RMT_cov, sample.nobs = 3907)
How do you get the fit measures for your model? How about the modification indices?
fitmeasures(fit2)
modindices(fit1)
Say you look at your modification indices and notice:
Worry =~ R3 65.717
In your output. What does this mean and what can you do?
This means that that there is a cross loading with a rumination item onto the worry factor. In this case we can just add this item to the worry factor (if justified):
model12 = β
Worry = W1 + W2 + W3 + W4 + R3
Rummi = R1 + R2 + R3 + R4
β
Say you look at your modification indices and notice:
R1 =~ R3 624.146
In your output. What does this mean and what can you do?
This means that there is residual covariance between R1 and R3, You can add it to your model like the following:
model12 = ' Worry = W1 + W2 + W3 + W4 + R3 Rummi = R1 + R2 + R3 + R4 R1 ~~ R3 '