Confirmatory Factor Analysis Flashcards
Describe how the confirmatory factor model is different to the exploratory factor model
The exploratory factor model:
πππ =ππ +π1π π1π +π2π π2π +β―πππ πππ +πππ
It is characteristic that all subtests/items load on all factors since we donβt know which factors should load on which items. You hope that some items load clearly on some factors but the model itself doesnβt pose any structure on the model.
The confirmatory factor model:
πππ =ππ +π1π π1π +π2π π2π +β―πππ πππ +πππ
with certain factor loadings (eg π1) fixed to 0 according to theory/expectation
The intercept is commonly omitted in both. In the confirmatory factor analysis, you explicitly state that items x load on factor 1 and items y load on factor 2 etc. This difference is visualised in docs.
How does confirmatory factor analysis differ from exploratory factor analysis in terms of notation?
The confirmatory factor model can be wrote in the same notation as an exploratory factor model:
πππ =ππ +π1ππ1π +π2ππ2π +β―ππππππ +πππ
However matrix notation is often used, as in the book. This makes it a little more compact as you donβt have to specify for item i etc as this information is contained in the matrices:
ππ =π+π²π πΌπ +ππ or ππ =π²π πΌπ +ππ (if fit on covariance matrix) with ππ = data matrix π = intercepts π²π = Factor loadings πΌπ = Factor scores ππ = Error residuals
See docs for how this looks in terms of matrices
Why, again, do you mostly omit the intercept in practice?
The intercept does not contain any information about the factor structure, it is only interesting if you want to apply a factor model to continuous data in the sense of fitting an IRT model (then it is an item attractiveness parameter).
There is an image of the factor loading matrix for the confirmatory factor model in docs. Explain what information it gives/ how the information is structured.
The rows (n) correspond to the items/ variables and the columns correspond to the factors (m). E.g π²π[2,3] is the factor loading of the second item of the third factor.
Describe how the data matrix, factor loading matrix and factor scores would look like in a confirmatory factor model in which the first three items are proposed to load on the first factor and the next three load on the second factor
ππ = π²π² = πΌπ = π¦π1. π11 0 π1π π¦π2 π21 0 π2π π¦π3 π31 0 π¦π4 0 π42 π¦π5 0 π52 π¦π6 0 π62
From this model you can derive something which has a different notation, describe this and give the notation
Because a lot of people focus on fitting the model to a covariance matrix, this matrix contains all this information as shown with the following formula:
πΊπ =π²π πΏπ²πβ² +π―π
The original formula contained ππ indicating the raw data, however this doesnβt exist with the covariance matrix. Therefore this formula is derived from the original formula to describe what information is taken from the cov. matrix. rather than raw data
πΊπ = Model predicted covariance matrix πΏ = Factor covariance matrix π²π² = Factor loading matrix π―π = Residual covariance matrix
Describe what πΊπ looks like/ how is is structured
πΊπ is a matrix with the variance of each item π^2 π¦1β¦6 down the diagonal, and covariance of each item mirrored on the left and right of the diagonal: ππ¦3π¦2 (covariance of item 3 and 2).
Describe the structure of πΏ
Factor covariance matrix
πΏ=
π^2π1 ππ1π2
ππ1π2 π^2π2
var(factor 1) cov(F1, F2)
cov(F1, F2) var(factor 2)
Describe the structure of π―π
the residual covariance matrix has the variance of the errors of each item down the diagonal of the matrix and 0s elsewhere similar to an identity matrix. There shouldnβt be covariance between the residuals as is an assumption in IRT
Describe the structure of πΊπ
Down the diagonal you have the variances of each item: π^2 11 π^2π1 +ππ1: π^2 21 π^2π1 +ππ2 π^2 31 π^2π1 +ππ3 π^2 42 π^2π2 +ππ4 π^2 52 π^2π2 +ππ5 π^2 62 π^2π2 +ππ6
Where the first is the loading of item 1 (on factor 1) squared times the factor variance plus the residual variance of item one
Beside the diagonal are the model implied covariances where
π11π21 π^2π1 gives the covariance of the factor loading of item one and item 2 times the variance of factor 1 since they load on the same factor
π21π62 ππ1π2 gives the covariance of the factor loading of item one and item 6 times the covariance of factor one on factor 2 since they load on different factors.
A better visualisation of this is given in docs
During CFA, what are two things you donβt necessarily want from your analysis/ try to avoid?
Cross loading: Where an item which was meant to load on one factor loads on another factor (shares variability with items from one factor that it does not share the the variables of its own factor)
Residual covariance/ correlation: Where there is covariance between error residuals on items
If you have residual covariance/ correlation, what do you hope for?
That there is some explanation, e.g the two imagined situations both take place in a supermarket so there is shared item specific error.
What changes in your factor loading matrix if you have a cross loading? For example if, in your earlier model of 6 variables loading on 2 factors, the fourth variable is loading on the first factor
The factor loading matrix goes from: π²π² = π11 0 π21 0 π31 0 0 π42 0 π52 0 π62
To: π²π² = π11 0 π21 0 π31 0 π41 π42 0 π52 0 π62
Since item four now also loads on factor 1
What happens your πΊπ if crossloadings are introduced
For each point describing
the covariance of the fourth item (loading on the second factor) with another item * the covariance of the factors/ variance of the second factor,
you also have to add
the covariance of the fourth (loading on the first factor) item with another item * the covariance of the factors/ variance of the first factor,
For the variance of the fourth item loading on the second factor you have to add the variance of the fourth item loading on the first factor and 2*(the covariance of the two items loading on the two factors * the covariance of the two factors)
This is better visualised in docs
How do residual covariances affect your CFA matrices? E.g for and error correlation between item 2 and 5
In the residual covariance matrix, the error matrix looks the same however in place of the zeroes at the intersection between 2 and 5 there is the error correlation, ππ2π5
In the predicted covariance matrix, it just adds an error covariance to the covariance between item 2 and 5
What two identification issues exist within CFA?
- Scaling the latent variable
2. Statistical identification