Confirmatory Factor Analysis Flashcards

Question 1

Q

Describe how the confirmatory factor model is different to the exploratory factor model

Answer

A

The exploratory factor model:
𝑌𝑝𝑖 =𝜐𝑖 +𝜆1𝑖 𝜂1𝑝 +𝜆2𝑖 𝜂2𝑝 +⋯𝜆𝑚𝑖 𝜂𝑚𝑝 +𝜖𝑝𝑖

It is characteristic that all subtests/items load on all factors since we don’t know which factors should load on which items. You hope that some items load clearly on some factors but the model itself doesn’t pose any structure on the model.

The confirmatory factor model:
𝑌𝑝𝑖 =𝜐𝑖 +𝜆1𝑖 𝜂1𝑝 +𝜆2𝑖 𝜂2𝑝 +⋯𝜆𝑚𝑖 𝜂𝑚𝑝 +𝜖𝑝𝑖
with certain factor loadings (eg 𝜆1) fixed to 0 according to theory/expectation

The intercept is commonly omitted in both. In the confirmatory factor analysis, you explicitly state that items x load on factor 1 and items y load on factor 2 etc. This difference is visualised in docs.

Question 2

Q

How does confirmatory factor analysis differ from exploratory factor analysis in terms of notation?

Answer

A

The confirmatory factor model can be wrote in the same notation as an exploratory factor model:

𝑌𝑝𝑖 =𝜐𝑖 +𝜆1𝑖𝜂1𝑝 +𝜆2𝑖𝜂2𝑝 +⋯𝜆𝑚𝑖𝜂𝑚𝑝 +𝜖𝑝𝑖

However matrix notation is often used, as in the book. This makes it a little more compact as you don’t have to specify for item i etc as this information is contained in the matrices:

𝒚𝑝 =𝝊+𝚲𝒚 𝜼𝑝 +𝝐𝑝  or  𝒚𝑝 =𝚲𝒚 𝜼𝑝 +𝝐𝑝 (if fit on covariance matrix)
with
𝒚𝑝 = data matrix
𝝊 = intercepts 
𝚲𝒚 = Factor loadings
𝜼𝑝 = Factor scores
𝝐𝑝 = Error residuals

See docs for how this looks in terms of matrices

Question 3

Q

Why, again, do you mostly omit the intercept in practice?

Answer

A

The intercept does not contain any information about the factor structure, it is only interesting if you want to apply a factor model to continuous data in the sense of fitting an IRT model (then it is an item attractiveness parameter).

Question 4

Q

There is an image of the factor loading matrix for the confirmatory factor model in docs. Explain what information it gives/ how the information is structured.

Answer

A

The rows (n) correspond to the items/ variables and the columns correspond to the factors (m). E.g 𝚲𝒚[2,3] is the factor loading of the second item of the third factor.

Question 5

Q

Describe how the data matrix, factor loading matrix and factor scores would look like in a confirmatory factor model in which the first three items are proposed to load on the first factor and the next three load on the second factor

Answer

A

𝒚𝑝 =         𝚲𝐲 =        𝜼𝑝 =
𝑦𝑝1.          𝜆11  0       𝜂1𝑝
𝑦𝑝2          𝜆21 0       𝜂2𝑝
𝑦𝑝3          𝜆31 0
𝑦𝑝4          0  𝜆42
𝑦𝑝5          0  𝜆52
𝑦𝑝6          0  𝜆62

Question 6

Q

From this model you can derive something which has a different notation, describe this and give the notation

Answer

A

Because a lot of people focus on fitting the model to a covariance matrix, this matrix contains all this information as shown with the following formula:
𝚺𝒚 =𝚲𝒚 𝚿𝚲𝒚′ +𝚯𝒚

The original formula contained 𝒚𝑝 indicating the raw data, however this doesn’t exist with the covariance matrix. Therefore this formula is derived from the original formula to describe what information is taken from the cov. matrix. rather than raw data

𝚺𝒚 = Model predicted covariance matrix
𝚿 =  Factor covariance matrix
𝚲𝐲 = Factor loading matrix
𝚯𝒚 = Residual covariance matrix

Question 7

Q

Describe what 𝚺𝒚 looks like/ how is is structured

Answer

A

𝚺𝒚 is a matrix with the variance of each item 𝜎^2 𝑦1…6 down the diagonal, and covariance of each item mirrored on the left and right of the diagonal: 𝜎𝑦3𝑦2 (covariance of item 3 and 2).

Question 8

Q

Describe the structure of 𝚿

Answer

A

Factor covariance matrix
𝚿=
𝜎^2𝜂1 𝜎𝜂1𝜂2
𝜎𝜂1𝜂2 𝜎^2𝜂2

var(factor 1) cov(F1, F2)
cov(F1, F2) var(factor 2)

Question 9

Q

Describe the structure of 𝚯𝒚

Answer

A

the residual covariance matrix has the variance of the errors of each item down the diagonal of the matrix and 0s elsewhere similar to an identity matrix. There shouldn’t be covariance between the residuals as is an assumption in IRT

Question 10

Q

Describe the structure of 𝚺𝒚

Answer

A

Down the diagonal you have the variances of each item:
𝜆^2 11 𝜎^2𝜂1 +𝜎𝜖1: 
𝜆^2 21 𝜎^2𝜂1 +𝜎𝜖2
𝜆^2 31 𝜎^2𝜂1 +𝜎𝜖3
𝜆^2 42 𝜎^2𝜂2 +𝜎𝜖4
𝜆^2 52 𝜎^2𝜂2 +𝜎𝜖5
𝜆^2 62 𝜎^2𝜂2 +𝜎𝜖6

Where the first is the loading of item 1 (on factor 1) squared times the factor variance plus the residual variance of item one

Beside the diagonal are the model implied covariances where

𝜆11𝜆21 𝜎^2𝜂1 gives the covariance of the factor loading of item one and item 2 times the variance of factor 1 since they load on the same factor

𝜆21𝜆62 𝜎𝜂1𝜂2 gives the covariance of the factor loading of item one and item 6 times the covariance of factor one on factor 2 since they load on different factors.

A better visualisation of this is given in docs

Question 11

Q

During CFA, what are two things you don’t necessarily want from your analysis/ try to avoid?

Answer

A

Cross loading: Where an item which was meant to load on one factor loads on another factor (shares variability with items from one factor that it does not share the the variables of its own factor)

Residual covariance/ correlation: Where there is covariance between error residuals on items

Question 12

Q

If you have residual covariance/ correlation, what do you hope for?

Answer

A

That there is some explanation, e.g the two imagined situations both take place in a supermarket so there is shared item specific error.

Question 13

Q

What changes in your factor loading matrix if you have a cross loading? For example if, in your earlier model of 6 variables loading on 2 factors, the fourth variable is loading on the first factor

Answer

A

The factor loading matrix goes from:
𝚲𝐲 =  
𝜆11  0      
𝜆21 0    
𝜆31 0
0  𝜆42
0  𝜆52
0  𝜆62

Since item four now also loads on factor 1

Question 14

Q

What happens your 𝚺𝒚 if crossloadings are introduced

Answer

A

For each point describing

the covariance of the fourth item (loading on the second factor) with another item * the covariance of the factors/ variance of the second factor,

you also have to add

the covariance of the fourth (loading on the first factor) item with another item * the covariance of the factors/ variance of the first factor,

For the variance of the fourth item loading on the second factor you have to add the variance of the fourth item loading on the first factor and 2*(the covariance of the two items loading on the two factors * the covariance of the two factors)

This is better visualised in docs

Question 15

Q

How do residual covariances affect your CFA matrices? E.g for and error correlation between item 2 and 5

Answer

A

In the residual covariance matrix, the error matrix looks the same however in place of the zeroes at the intersection between 2 and 5 there is the error correlation, 𝜎𝜖2𝜖5

In the predicted covariance matrix, it just adds an error covariance to the covariance between item 2 and 5

Question 16

Q

What two identification issues exist within CFA?

Answer

A

Scaling the latent variable

2. Statistical identification

Question 17

Q

What does scaling the latent variable consist of in CFA?

Answer

A

Similar as in the one factor model except you have multiple factors now, the scale of the latent variables (factors) are identified by
fixing the mean of each factor to 0 and:
• Option 1:
• Fix one factor loading to 1 for each factor
• Option 2:
• Fix the factor variance to 1 for each factor

This really doesn’t make a difference to the conclusions drawn

Question 18

Q

What do the first option (Fix one factor loading to 1 for each factor) for scaling the latent variable in CFA mean for the matrices?

Answer

A

Has implications for the factor loadings matrix:
The factor loading matrix goes from:
𝚲𝐲 =  
𝜆11  0      
𝜆21 0    
𝜆31 0
0  𝜆42
0  𝜆52
0  𝜆62

In lavaan it automatically takes the first factor

Question 19

Q

What do the second option (Fix the factor variance to 1 for each factor) for scaling the latent variable in CFA mean for the matrices?

Answer

A

The factor covariance matrix goes from:
𝚿=
𝜎^2𝜂1 𝜎𝜂1𝜂2
𝜎𝜂1𝜂2 𝜎^2𝜂2

var(factor 1) cov(F1, F2)
cov(F1, F2) var(factor 2)

to:
𝚿=
1 𝜎𝜂1𝜂2
𝜎𝜂1𝜂2 1

Question 20

Q

What is an advantage of changing the factor variances to 1?

Answer

A

𝚿=
1 𝜎𝜂1𝜂2
𝜎𝜂1𝜂2 1

This is nice because it makes the variance between the factors a correlation, and in the output you can immediately see the correlation between the factors

Question 21

Q

What is involved in the statistical identification of CFA?

Answer

A

• The number of parameters should not exceed the number of observed (co)variances
- In CFA this can happen if you have too few observed variables for your model

Question 22

Q

What is a good metric to test if this statistical identification is satisfied?

Answer

A

A model should always have degrees of freedom larger than or equal to 0 (𝑑𝑓 =0 is possible, but limited usefulness)

Question 23

Q

How do you calculate the degrees of freedom for CFA?

Answer

A

Same as EFA:
𝑑𝑓 = 𝑀– 𝑘
𝑀: number of independent pieces of observed information
𝑘: number of parameters

If CFA conducted on covariance matrix:
• 𝑀 = 𝑝 ∗ (𝑝 + 1)/2
𝑝: number of observed variables 
• E.g.,: 𝑝=4 →𝑀 = 4 ∗ 5/2 =10
• E.g.,: 𝑝=7 →𝑀 = 7 ∗ 8/2 =28

Different to EFA:
For k (number of parameters) there is no straightforward as in EFA, you have to think about how many parameters are in the formula. For this you really have to understand the model (see next card for example)

Question 24

Q

Calculate m and k for a model with two factors in which items 1:4 load on the first factor and items 4:6 load on the second factor. The first factor loading for each factor is set to 1 for identification

Answer

A

Independent pieces of information:
𝑀 =6 ∗7/2 =21

Number of parameters (there is no formula!):
𝑘 =6 (residual variances) + 5 (loadings) + 2 (factor variance) + 1 (factor covariance) = 14

To make sense of this, think of the matrices used for this model:
𝒚𝑝 =         𝚲𝐲 =        𝜼𝑝 =
𝑦𝑝1           1     0       𝜂1𝑝
𝑦𝑝2          𝜆21 0       𝜂2𝑝
𝑦𝑝3          𝜆31 0
𝑦𝑝4          𝜆42  1
𝑦𝑝5          0  𝜆52
𝑦𝑝6          0  𝜆62

Same amount of residual variances (errors) as items = 6
7 factor loadings - 2 fixed parameters = 5 free parameters

Factor covariance matrix
𝚿=
𝜎^2𝜂1 𝜎𝜂1𝜂2
𝜎𝜂1𝜂2 𝜎^2𝜂2

Two factor variances along diagonal
1 factor covariance between factors (matrix is mirrored)

Question 25

Q

How can residual variances affect statistical identification?

Answer

A

Each correlation/ covariance between errors is an additional parameter

E.g model with three parameters, 1 factor and no residual variance:
Just identified!
• M = 3*4/2=6, k=6, df = 0

model with three parameters, 1 factor and residual variance between 2 items:
Not identified
• M = 3*4/2=6, k=7, df = 1

Question 26

Q

In CFA, it’s all about model fit according to the strange man talking on my laptop. Why does he make this claim?

Answer

A

Because you’re interested in tested a hypothesised factor structure so hypothesis testing is the centralised theme

Question 27

Q

What indices do you use to analyse model fit?

Answer

A

Theres a whole bunch of them since model fit is the centralised theme, however the one most used and reported in the chi-sq (𝜒2) goodness of fit indice

Question 28

Q

How do you calculate 𝜒2 goodness of fit? (again)

Answer

A

𝜒2 = −2 ∗ 𝐹(𝑀𝐿) with 𝑑𝑓 = 𝑀 − 𝑘

where 𝐹(𝑀𝐿) is the value of the fit function that is maximized in Maximum Likelihood and also fuck my life depending on the context

If it is significant, your model doesn’t fit

Question 29

Q

Give four more goodness of fit measures and what they do

Answer

A

Standardized Root Mean Squared Residual (SRMR)
- Standardized difference between 𝑆 (observed cov matrix) and Σy (predicted cov matrix)
Root Mean Sqaure Error of Approximation (RMSEA)
- SRMR with correction for number of parameters
Comparative Fit Index (CFI)
- Compares model to a baseline model without correlations between variables
- Normed between 0 and 1
Tucker–Lewis Index (TLI)
- Similar to CFI but non-normed (can be larger than 1 or smaller than 0)

Question 30

Q

Give four comparative fit models, what they do and what they require

Answer

A

• Likelihood ratio test (requires nested models):
• 𝜒2 = −2(𝑙𝑜𝑔𝐿(𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡) –𝑙𝑜𝑔𝐿(𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡))
with 𝑑𝑓 = 𝑘𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 −𝑘𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡

 or equivalently (Since it is also a 𝜒2 statistic you can just subtract the 𝜒2 from each other)       
 • 𝜒2 =𝜒2𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 −𝜒2𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 
  with   𝑑𝑓 =𝑑𝑓𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 –𝑑𝑓𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡

If significant, your unconstrained model should be preferred

Akaike Information Criterion (AIC)
- 𝐴𝐼𝐶 =𝜒2 +2×𝑘
- Compare to competing model, no nesting necessary
Bayesian Information Criterion (BIC)
- 𝐵𝐼𝐶 =𝜒2 +log(𝑁)×𝑘
- Compare to competing model, no nesting necessary
Corrected Akaike Information Criterion (CAIC)
- 𝐶𝐴𝐼𝐶 =𝜒2 +[1+log 𝑁 ]×𝑘
- Compare to competing model, no nesting necessary

Question 31

Q

How do the AIC, BIC and CAIC work and which is the most ‘strict’ among them?

Answer

A

The lower value indicates a better model and they punish for the amount of parameters. The BIC and CAIC are more strict than AIC, meaning they punish more for more complex models.

Question 32

Q

So far we’ve talked about absolute model fit (does the model fit, y/n) and comparative model fit (which model fits better?). What type of model fit index is left?

Answer

A

Local model fit: Look at a model (typically not fitting great) and analysing where in the model fit do you need to change something to improve model fit e.g introducing a cross loading or residual covariance to improve model fit

Question 33

Q

What are the indices used to assess local model fit called? For what parameters are they available for?

Answer

A

Modification indices; Available for all parameters that are fixed
• E.g., cross-loadings, residual covariances

Question 34

Q

What do these modification indices indicate>

Answer

A

Indicate how much the chi-square fit statistic will improve (decrease) if that parameter is freed

Question 35

Q

What type of statistics are these and what implications does this have? (2)

Answer

A

Strictly, these are χ(1) statistics
• i.e., a value larger than 3.81 is significant
• Results in serious chance capitalisation and overfitting

Question 36

Q

What is the recommend cut off?

Answer

A

Some people recommend cut-off at 10.00 (if a modification is above 10 then you should free that parameter)
• But still danger of chance capitalization / overfitting, Therefore, use very carefully

Question 37

Q

When should you make modifications to your model?

Answer

A

Only free a parameter if its modification index is extremely high as compared to the others
• Ideally, there is an explanation for the misfit
• E.g., for IQ test: residual covariance between Block Design and Object Assembly

Question 38

Q

Say you wanted to see if worry and rumination questionnaires were better suited to a one factor or 2 factor model with W1, W2, W3 and W4 measuring worry and R1, R2, R3 and R4 measuring rumination

Write R code which would help with this

Answer

A

model12 = ‘
Worry = W1 + W2 + W3 + W4
Rummi = R1 + R2 + R3 + R4
‘

fit2 <- cfa(model12, sample.cov = RMT_cov, sample.nobs = 3907)

model11 = ‘
RNT = W1 + W2 + W3 + W4 + R1 + R2 + R3 + R4
‘

fit1 = cfa(model11, sample.cov = RMT_cov, sample.nobs = 3907)

Question 39

Q

How do you get the fit measures for your model? How about the modification indices?

Answer

A

fitmeasures(fit2)

modindices(fit1)

Question 40

Q

Say you look at your modification indices and notice:

Worry =~ R3 65.717

In your output. What does this mean and what can you do?

Answer

A

This means that that there is a cross loading with a rumination item onto the worry factor. In this case we can just add this item to the worry factor (if justified):

model12 = ‘
Worry = W1 + W2 + W3 + W4 + R3
Rummi = R1 + R2 + R3 + R4
‘

Question 41

Q

Say you look at your modification indices and notice:

R1 =~ R3 624.146

In your output. What does this mean and what can you do?

Answer

A

This means that there is residual covariance between R1 and R3, You can add it to your model like the following:

model12 = '
Worry =   W1 + W2 + W3 + W4 + R3
Rummi = R1 + R2 + R3 + R4
R1 ~~ R3
'