Confirmatory Factor Analysis Flashcards

1
Q

Describe how the confirmatory factor model is different to the exploratory factor model

A

The exploratory factor model:
π‘Œπ‘π‘– =πœπ‘– +πœ†1𝑖 πœ‚1𝑝 +πœ†2𝑖 πœ‚2𝑝 +β‹―πœ†π‘šπ‘– πœ‚π‘šπ‘ +πœ–π‘π‘–

It is characteristic that all subtests/items load on all factors since we don’t know which factors should load on which items. You hope that some items load clearly on some factors but the model itself doesn’t pose any structure on the model.

The confirmatory factor model:
π‘Œπ‘π‘– =πœπ‘– +πœ†1𝑖 πœ‚1𝑝 +πœ†2𝑖 πœ‚2𝑝 +β‹―πœ†π‘šπ‘– πœ‚π‘šπ‘ +πœ–π‘π‘–
with certain factor loadings (eg πœ†1) fixed to 0 according to theory/expectation

The intercept is commonly omitted in both. In the confirmatory factor analysis, you explicitly state that items x load on factor 1 and items y load on factor 2 etc. This difference is visualised in docs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does confirmatory factor analysis differ from exploratory factor analysis in terms of notation?

A

The confirmatory factor model can be wrote in the same notation as an exploratory factor model:

π‘Œπ‘π‘– =πœπ‘– +πœ†1π‘–πœ‚1𝑝 +πœ†2π‘–πœ‚2𝑝 +β‹―πœ†π‘šπ‘–πœ‚π‘šπ‘ +πœ–π‘π‘–

However matrix notation is often used, as in the book. This makes it a little more compact as you don’t have to specify for item i etc as this information is contained in the matrices:

π’šπ‘ =𝝊+πš²π’š πœΌπ‘ +𝝐𝑝  or  π’šπ‘ =πš²π’š πœΌπ‘ +𝝐𝑝 (if fit on covariance matrix)
with
π’šπ‘ = data matrix
𝝊 = intercepts 
πš²π’š = Factor loadings
πœΌπ‘ = Factor scores
𝝐𝑝 = Error residuals

See docs for how this looks in terms of matrices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why, again, do you mostly omit the intercept in practice?

A

The intercept does not contain any information about the factor structure, it is only interesting if you want to apply a factor model to continuous data in the sense of fitting an IRT model (then it is an item attractiveness parameter).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

There is an image of the factor loading matrix for the confirmatory factor model in docs. Explain what information it gives/ how the information is structured.

A

The rows (n) correspond to the items/ variables and the columns correspond to the factors (m). E.g πš²π’š[2,3] is the factor loading of the second item of the third factor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe how the data matrix, factor loading matrix and factor scores would look like in a confirmatory factor model in which the first three items are proposed to load on the first factor and the next three load on the second factor

A
π’šπ‘ =         𝚲𝐲 =        πœΌπ‘ =
𝑦𝑝1.          πœ†11  0       πœ‚1𝑝
𝑦𝑝2          πœ†21 0       πœ‚2𝑝
𝑦𝑝3          πœ†31 0
𝑦𝑝4          0  πœ†42
𝑦𝑝5          0  πœ†52
𝑦𝑝6          0  πœ†62
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

From this model you can derive something which has a different notation, describe this and give the notation

A

Because a lot of people focus on fitting the model to a covariance matrix, this matrix contains all this information as shown with the following formula:
πšΊπ’š =πš²π’š πšΏπš²π’šβ€² +πš―π’š

The original formula contained π’šπ‘ indicating the raw data, however this doesn’t exist with the covariance matrix. Therefore this formula is derived from the original formula to describe what information is taken from the cov. matrix. rather than raw data

πšΊπ’š = Model predicted covariance matrix
𝚿 =  Factor covariance matrix
𝚲𝐲 = Factor loading matrix
πš―π’š = Residual covariance matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe what πšΊπ’š looks like/ how is is structured

A

πšΊπ’š is a matrix with the variance of each item 𝜎^2 𝑦1…6 down the diagonal, and covariance of each item mirrored on the left and right of the diagonal: πœŽπ‘¦3𝑦2 (covariance of item 3 and 2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the structure of 𝚿

A

Factor covariance matrix
𝚿=
𝜎^2πœ‚1 πœŽπœ‚1πœ‚2
πœŽπœ‚1πœ‚2 𝜎^2πœ‚2

var(factor 1) cov(F1, F2)
cov(F1, F2) var(factor 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the structure of πš―π’š

A

the residual covariance matrix has the variance of the errors of each item down the diagonal of the matrix and 0s elsewhere similar to an identity matrix. There shouldn’t be covariance between the residuals as is an assumption in IRT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the structure of πšΊπ’š

A
Down the diagonal you have the variances of each item:
πœ†^2 11 𝜎^2πœ‚1 +πœŽπœ–1: 
πœ†^2 21 𝜎^2πœ‚1 +πœŽπœ–2
πœ†^2 31 𝜎^2πœ‚1 +πœŽπœ–3
πœ†^2 42 𝜎^2πœ‚2 +πœŽπœ–4
πœ†^2 52 𝜎^2πœ‚2 +πœŽπœ–5
πœ†^2 62 𝜎^2πœ‚2 +πœŽπœ–6

Where the first is the loading of item 1 (on factor 1) squared times the factor variance plus the residual variance of item one

Beside the diagonal are the model implied covariances where

πœ†11πœ†21 𝜎^2πœ‚1 gives the covariance of the factor loading of item one and item 2 times the variance of factor 1 since they load on the same factor

πœ†21πœ†62 πœŽπœ‚1πœ‚2 gives the covariance of the factor loading of item one and item 6 times the covariance of factor one on factor 2 since they load on different factors.

A better visualisation of this is given in docs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

During CFA, what are two things you don’t necessarily want from your analysis/ try to avoid?

A

Cross loading: Where an item which was meant to load on one factor loads on another factor (shares variability with items from one factor that it does not share the the variables of its own factor)

Residual covariance/ correlation: Where there is covariance between error residuals on items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If you have residual covariance/ correlation, what do you hope for?

A

That there is some explanation, e.g the two imagined situations both take place in a supermarket so there is shared item specific error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What changes in your factor loading matrix if you have a cross loading? For example if, in your earlier model of 6 variables loading on 2 factors, the fourth variable is loading on the first factor

A
The factor loading matrix goes from:
𝚲𝐲 =  
πœ†11  0      
πœ†21 0    
πœ†31 0
0  πœ†42
0  πœ†52
0  πœ†62
To:
𝚲𝐲 =  
πœ†11    0      
πœ†21   0    
πœ†31   0
πœ†41  πœ†42
0     πœ†52
0     πœ†62

Since item four now also loads on factor 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What happens your πšΊπ’š if crossloadings are introduced

A

For each point describing

the covariance of the fourth item (loading on the second factor) with another item * the covariance of the factors/ variance of the second factor,

you also have to add

the covariance of the fourth (loading on the first factor) item with another item * the covariance of the factors/ variance of the first factor,

For the variance of the fourth item loading on the second factor you have to add the variance of the fourth item loading on the first factor and 2*(the covariance of the two items loading on the two factors * the covariance of the two factors)

This is better visualised in docs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do residual covariances affect your CFA matrices? E.g for and error correlation between item 2 and 5

A

In the residual covariance matrix, the error matrix looks the same however in place of the zeroes at the intersection between 2 and 5 there is the error correlation, πœŽπœ–2πœ–5

In the predicted covariance matrix, it just adds an error covariance to the covariance between item 2 and 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What two identification issues exist within CFA?

A
  1. Scaling the latent variable

2. Statistical identification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does scaling the latent variable consist of in CFA?

A

Similar as in the one factor model except you have multiple factors now, the scale of the latent variables (factors) are identified by
fixing the mean of each factor to 0 and:
β€’ Option 1:
β€’ Fix one factor loading to 1 for each factor
β€’ Option 2:
β€’ Fix the factor variance to 1 for each factor

This really doesn’t make a difference to the conclusions drawn

18
Q

What do the first option (Fix one factor loading to 1 for each factor) for scaling the latent variable in CFA mean for the matrices?

A
Has implications for the factor loadings matrix:
The factor loading matrix goes from:
𝚲𝐲 =  
πœ†11  0      
πœ†21 0    
πœ†31 0
0  πœ†42
0  πœ†52
0  πœ†62
To:
𝚲𝐲 =  
1       0      
πœ†21   0    
πœ†31   0
πœ†41  1
0     πœ†52
0     πœ†62

In lavaan it automatically takes the first factor

19
Q

What do the second option (Fix the factor variance to 1 for each factor) for scaling the latent variable in CFA mean for the matrices?

A

The factor covariance matrix goes from:
𝚿=
𝜎^2πœ‚1 πœŽπœ‚1πœ‚2
πœŽπœ‚1πœ‚2 𝜎^2πœ‚2

var(factor 1) cov(F1, F2)
cov(F1, F2) var(factor 2)

to:
𝚿=
1 πœŽπœ‚1πœ‚2
πœŽπœ‚1πœ‚2 1

20
Q

What is an advantage of changing the factor variances to 1?

A

𝚿=
1 πœŽπœ‚1πœ‚2
πœŽπœ‚1πœ‚2 1

This is nice because it makes the variance between the factors a correlation, and in the output you can immediately see the correlation between the factors

21
Q

What is involved in the statistical identification of CFA?

A

β€’ The number of parameters should not exceed the number of observed (co)variances
- In CFA this can happen if you have too few observed variables for your model

22
Q

What is a good metric to test if this statistical identification is satisfied?

A

A model should always have degrees of freedom larger than or equal to 0 (𝑑𝑓 =0 is possible, but limited usefulness)

23
Q

How do you calculate the degrees of freedom for CFA?

A

Same as EFA:
𝑑𝑓 = 𝑀– π‘˜
𝑀: number of independent pieces of observed information
π‘˜: number of parameters

If CFA conducted on covariance matrix:
β€’ 𝑀 = 𝑝 βˆ— (𝑝 + 1)/2
𝑝: number of observed variables 
β€’ E.g.,: 𝑝=4 →𝑀 = 4 βˆ— 5/2 =10
β€’ E.g.,: 𝑝=7 →𝑀 = 7 βˆ— 8/2 =28
Different to EFA:
For k (number of parameters) there is no straightforward as in EFA, you have to think about how many parameters are in the formula. For this you really have to understand the model (see next card for example)
24
Q

Calculate m and k for a model with two factors in which items 1:4 load on the first factor and items 4:6 load on the second factor. The first factor loading for each factor is set to 1 for identification

A

Independent pieces of information:
𝑀 =6 βˆ—7/2 =21

Number of parameters (there is no formula!):
π‘˜ =6 (residual variances) + 5 (loadings) + 2 (factor variance) + 1 (factor covariance) = 14

To make sense of this, think of the matrices used for this model:
π’šπ‘ =         𝚲𝐲 =        πœΌπ‘ =
𝑦𝑝1           1     0       πœ‚1𝑝
𝑦𝑝2          πœ†21 0       πœ‚2𝑝
𝑦𝑝3          πœ†31 0
𝑦𝑝4          πœ†42  1
𝑦𝑝5          0  πœ†52
𝑦𝑝6          0  πœ†62

Same amount of residual variances (errors) as items = 6
7 factor loadings - 2 fixed parameters = 5 free parameters

Factor covariance matrix
𝚿=
𝜎^2πœ‚1 πœŽπœ‚1πœ‚2
πœŽπœ‚1πœ‚2 𝜎^2πœ‚2

Two factor variances along diagonal
1 factor covariance between factors (matrix is mirrored)

25
Q

How can residual variances affect statistical identification?

A

Each correlation/ covariance between errors is an additional parameter

E.g model with three parameters, 1 factor and no residual variance:
Just identified!
β€’ M = 3*4/2=6, k=6, df = 0

model with three parameters, 1 factor and residual variance between 2 items:
Not identified
β€’ M = 3*4/2=6, k=7, df = 1

26
Q

In CFA, it’s all about model fit according to the strange man talking on my laptop. Why does he make this claim?

A

Because you’re interested in tested a hypothesised factor structure so hypothesis testing is the centralised theme

27
Q

What indices do you use to analyse model fit?

A

Theres a whole bunch of them since model fit is the centralised theme, however the one most used and reported in the chi-sq (πœ’2) goodness of fit indice

28
Q

How do you calculate πœ’2 goodness of fit? (again)

A

πœ’2 = βˆ’2 βˆ— 𝐹(𝑀𝐿) with 𝑑𝑓 = 𝑀 βˆ’ π‘˜

where 𝐹(𝑀𝐿) is the value of the fit function that is maximized in Maximum Likelihood and also fuck my life depending on the context

If it is significant, your model doesn’t fit

29
Q

Give four more goodness of fit measures and what they do

A
  • Standardized Root Mean Squared Residual (SRMR)
    • Standardized difference between 𝑆 (observed cov matrix) and Ξ£y (predicted cov matrix)
  • Root Mean Sqaure Error of Approximation (RMSEA)
    • SRMR with correction for number of parameters
  • Comparative Fit Index (CFI)
    • Compares model to a baseline model without correlations between variables
    • Normed between 0 and 1
  • Tucker–Lewis Index (TLI)
    • Similar to CFI but non-normed (can be larger than 1 or smaller than 0)
30
Q

Give four comparative fit models, what they do and what they require

A

β€’ Likelihood ratio test (requires nested models):
β€’ πœ’2 = βˆ’2(π‘™π‘œπ‘”πΏ(π‘π‘œπ‘›π‘ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘‘) β€“π‘™π‘œπ‘”πΏ(π‘’π‘›π‘π‘œπ‘›π‘ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘‘))
with 𝑑𝑓 = π‘˜π‘’π‘›π‘π‘œπ‘›π‘ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘‘ βˆ’π‘˜π‘π‘œπ‘›π‘ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘‘

 or equivalently (Since it is also a πœ’2 statistic you can just subtract the πœ’2 from each other)       
 β€’ πœ’2 =πœ’2π‘π‘œπ‘›π‘ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘‘ βˆ’πœ’2π‘’π‘›π‘π‘œπ‘›π‘ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘‘ 
  with   𝑑𝑓 =π‘‘π‘“π‘π‘œπ‘›π‘ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘‘ β€“π‘‘π‘“π‘’π‘›π‘π‘œπ‘›π‘ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘‘

If significant, your unconstrained model should be preferred
  • Akaike Information Criterion (AIC)
    • 𝐴𝐼𝐢 =πœ’2 +2Γ—π‘˜
    • Compare to competing model, no nesting necessary
  • Bayesian Information Criterion (BIC)
    • 𝐡𝐼𝐢 =πœ’2 +log(𝑁)Γ—π‘˜
    • Compare to competing model, no nesting necessary
  • Corrected Akaike Information Criterion (CAIC)
    • 𝐢𝐴𝐼𝐢 =πœ’2 +[1+log 𝑁 ]Γ—π‘˜
    • Compare to competing model, no nesting necessary
31
Q

How do the AIC, BIC and CAIC work and which is the most β€˜strict’ among them?

A

The lower value indicates a better model and they punish for the amount of parameters. The BIC and CAIC are more strict than AIC, meaning they punish more for more complex models.

32
Q

So far we’ve talked about absolute model fit (does the model fit, y/n) and comparative model fit (which model fits better?). What type of model fit index is left?

A

Local model fit: Look at a model (typically not fitting great) and analysing where in the model fit do you need to change something to improve model fit e.g introducing a cross loading or residual covariance to improve model fit

33
Q

What are the indices used to assess local model fit called? For what parameters are they available for?

A

Modification indices; Available for all parameters that are fixed
β€’ E.g., cross-loadings, residual covariances

34
Q

What do these modification indices indicate>

A

Indicate how much the chi-square fit statistic will improve (decrease) if that parameter is freed

35
Q

What type of statistics are these and what implications does this have? (2)

A

Strictly, these are Ο‡(1) statistics
β€’ i.e., a value larger than 3.81 is significant
β€’ Results in serious chance capitalisation and overfitting

36
Q

What is the recommend cut off?

A

Some people recommend cut-off at 10.00 (if a modification is above 10 then you should free that parameter)
β€’ But still danger of chance capitalization / overfitting, Therefore, use very carefully

37
Q

When should you make modifications to your model?

A

Only free a parameter if its modification index is extremely high as compared to the others
β€’ Ideally, there is an explanation for the misfit
β€’ E.g., for IQ test: residual covariance between Block Design and Object Assembly

38
Q

Say you wanted to see if worry and rumination questionnaires were better suited to a one factor or 2 factor model with W1, W2, W3 and W4 measuring worry and R1, R2, R3 and R4 measuring rumination

Write R code which would help with this

A

model12 = β€˜
Worry = W1 + W2 + W3 + W4
Rummi = R1 + R2 + R3 + R4
β€˜

fit2 <- cfa(model12, sample.cov = RMT_cov, sample.nobs = 3907)

model11 = β€˜
RNT = W1 + W2 + W3 + W4 + R1 + R2 + R3 + R4
β€˜

fit1 = cfa(model11, sample.cov = RMT_cov, sample.nobs = 3907)

39
Q

How do you get the fit measures for your model? How about the modification indices?

A

fitmeasures(fit2)

modindices(fit1)

40
Q

Say you look at your modification indices and notice:

Worry =~ R3 65.717

In your output. What does this mean and what can you do?

A

This means that that there is a cross loading with a rumination item onto the worry factor. In this case we can just add this item to the worry factor (if justified):

model12 = β€˜
Worry = W1 + W2 + W3 + W4 + R3
Rummi = R1 + R2 + R3 + R4
β€˜

41
Q

Say you look at your modification indices and notice:

R1 =~ R3 624.146

In your output. What does this mean and what can you do?

A

This means that there is residual covariance between R1 and R3, You can add it to your model like the following:

model12 = '
Worry =   W1 + W2 + W3 + W4 + R3
Rummi = R1 + R2 + R3 + R4
R1 ~~ R3
'