Exploratory Factor Analyses Flashcards
When do you carry out a factor analysis?
When you have continuous latent data and continuous observed data
How did we know that the observed data was categorical in IRT?
Answeres were yes/ no, likert scale etc
How do you know if the observed data is continuous?
In psychology we rarely have truly continuous data, an example is reaction time. As a rule; if an item has more than five points (scale) and forms a normal distribution, you can consider it as a continuous item and perform factor analysis on it.
When is factor analysis often applied?
Sum scores on sub tests (e.g dimensions of intelligence)
What is the exciting thing about factor analysis as compared to item response theory according to Dylan
The nice thing about IRT is that you really analyse the individual items. FA is more flexible; continuous data is easier to model and the mathematics and formulas is more simple.
Why did IRT form an S shaped curve?
Because youβre modelling a probability of an outcome (correct score) since you have categorical data
Does one-factor FA have a s shaped curve? Explain
No, itβs a linear function. The expected value is on the x axis and since weβre no longer estimating probability and working with continuous data, the scores go higher than one and we can use a linear model. For this reason it is sometimes known as the linear factor model.
Explain the linear factor analysis equation
πΈ (πππ |ππ) =ππ +ππππ
or πππ =ππ +ππππ +πππ
with VAR(Xpi | πp)
ππ is commonly referred to as item attractiveness but it roughly translates to the IRT item difficulty/ easiness parameter and it is the intercept in the slope.
e. g., βI think about suicideβ has a low attractiveness
e. g., βI am statisfied with my lifeβ has a higher attractiveness
ππ is the item discrimination, same as IRT and it forms the slope of the function. We model the expected value of the continuous item, and if it is a good model then the observations should be around that line.
the variance is just how much the observed points vary for the modelled line
How can the notation for this model change? Explain
πΈ (πππ |ππ) =ππ +ππππ is written as: πππ =ππ +ππππ +πππ in factor analysis literature ππ is an intercept ππ is a factor loading ππ is the common factor πππ is the residual
They mean the same things but are just written differently in IRT literature compared to FA literature.
Conceptually, what is the goal of factor analysis?
Its a statistical approach to extract common variance from the items and separate it from the variable specific effects
How do the parameters translate to the variance measured?
πππ =ππ +ππππ +πππ
The common factor variance (π^2π) is the variance caused by the latent trait
The factor loadings (ππ) tunes how much common factor variability comes from each item out of all the variability in the items - how well each item measures the latent trait
The intercepts (ππ), in a single group application, are simply the item means
The residual variances (π^2,ππ) tells you how much of the variance is unique to the item
What does this model imply about the data? How can this be used in calculations?
The model implies some kind of structure in the data in terms of the variance. You can calculate how much variability does this model predict for item 1. You can compare this to the observed variance which should be the factor loading squared times the variance plus the residual variance.
It also implies some kind of covariance between the items since they measure the same thing. If you score high on one item you are assumed to score higher on a second item. You can calculate the expected covariance through this.
How do you calculate the expected covariance? What should you look for in this?
The expected covariance is the first factor loading multiplied by the second factor loading etc multiplied by the factor variance. You should look to see if this is close to the observed covariance to assess whether this is a good model for the data.
How is the proportion of the variance calculated?
π|2π|πππ = π|2π| π|2π| / π|2π| π|2π| +π|2π|π
i.e the variance of the latent variable multipled / the total variance = the variance explained by the factor
The varaince not explained by the factor (uniqueness) is calculated by:
1 - π|2π|πππ = π|2 ππ| / π|2π| π|2π| +π|2π|π
However most of the time you can just read these from the output in Rstudio
What is the equivalent of chronbachs alpha from CTT here and hoiw is it calculated?
The reliability of the sum score, the ratio between the variability due to the latent trait and the total variability
ππ =π^2 (π ) / π^2 (π) = π^2 (πΈ(π)) / π^2 (π )
= β¦
= ππ2 Γ (E|π,π=1| ππ)^ 2 /
π|2,π| Γ (E|π,π=1| ππ)^ 2 + E|π,π=1 π|2,ππ|
i.e multiply the factor variance by the sum of the loadings squared, divided by the factor variance by the loadings squared by the sum of the residual variances squared summed
How does identification change compared to IRT and why?
It is very similar to IRT, we need to identify the model because the latent variable doesnβt have a scale or a unit so we have to create one. In factor analysis, however, we play around with this more. In IRT we fixed the mean to 0 and the std to 1 because the R packages donβt allow you to change it much and its not interesting when its one dimensional. People like to change the identification to get a different scale for the parameters, this will not change the conclusions or the p-value since the proportions between the parameters donβt change.
Give an example of how you can have two different options for identification in factor analysis
Option 1:
β’ ππ =0
β’ ππ2 =1
Option 2:
β’ ππ =0
β’ Fix one factor loading to 1
By picking an arbitrary factor loading and fixing it to one is like saying that the scale of the latent variable is the same as the scale of that item
What is the most dominant approach to parameter estimation in factor analysis?
Maximum likelihood
What does using MLE in this instance assume about the data?
Normally distributed
What can you do as opposed to fitting the model on the raw data with MLE for a factor analysis? Why might you want to do this?
You have the option to only analyse the observed covariance matrix which is very useful for factor analysis since a covariance matrix already contains all the information about the structure of your data. From a covariance matrix you ca already fit a one parameter model since you have your factor loadings and residual variance.
What is a downside to using the covariance matrix in estimating the MLE for a factor analysis?
Thereβs no intercepts in the model because for the intercepts you really need the means of the data which is not contained in a covariance matrix.
What two alternative methods for parameter estimation exist?
Weighted least squares and bayesian estimation (Also popular but not discussed in this course)
Given data with 1000 subjects answering 10 questions on a 5 point likert scale, how would you write up code to carry out a factor analysis? Explain the code
head(E)
library(lavaan)
model = βeta = ~Y1 + Y2 + Y3 + Y4 +Y5 + Y6 +Y7 + Y8 +Y9 + Y10β
fit = cfa(model = model, data = E, meansstructure = TRUE, std.lv = TRUE)
where eta is the common factor and can be given any name akin to a variable, =~ indicated that it is measured byβ¦, cfa runs a confirmatory factor analysis, data is called E, meanstructure = TRUE means that you want to calculate the intercepts, std.lv = TRUE means that you want to standardise the latent variable with ππ =0 and ππ2 =1
There is a screenshot of CFA output in the docs. Describe the information that is presented under the factor name
The factor loadings are given under whatever you called the factor (e.g eta). The estimates are the factor loading estimates (how well an item measures the latent trait), negative items are contra-indicative in that the higher you score on an item, the lower you score on a latent trait. If you divide the estimate by the standard error then you get the z value, both of which are shown. If an estimate is shown to be non-significant then it does not measure the latent trait. Std.a;; are the standardised factor loadings; they are the correlations between the item/variable and the factor. If you square this value then you get the shared variance between the item and the common factor.
There is a screenshot of CFA output in the docs. Describe the information that is presented under intercepts
For the intercepts the estimated intercepts are simply the mean of the variable (mean response to an item). Std error, z value and Pvalue are similar to before and not interesting. Std.all gives the standardised intercepts; the mean if the variable is standardised (We wonβt use this much.)
There is a screenshot of CFA output in the docs. Describe the information that is presented under variances
Under variances, the estimates are the estimated residual variances (How much they deviated from the model. The variance of the latent trait (efa) is set to one here if you specified lvd.sd = TRUE. Std.ERr, zvalue and pvalue are also uninteresting here especially since you cannot even use these to test for the residual variances larger than 0 due to a boundary constraint. The STd.all gives the standardised residual variances which are interesting; they are the unexplained variance by factor, 1 - P|2,Y|mp i.e βuniquenessβ
Contrast the two main types of factor analysis
Exploratory factor analysis: is used when the factor structure is unknown (number, loadings)
β’ The number of factors, π, is systematically altered by the researcher
β’ All items load on all factors
Confirmatory factor analysis is used when the number of factors, π, is derived from theory/expectations
β’ The loadings are derived from theory/expectations
Give the model for an exploratory factor analysis
πππ =π1π π1π +π2π π2π +β―πππ πππ +πππ
up until π factors where π1π is the first factor for person p and π2π is the second factor for person p.
π1π is the loading of item i on that first factor.
π2π is the loading of item π on factor 2, π2π etc.
πππ is the residual
Where is the intercept in an exploratory factor analysis model?
There is not intercept, the key is to reveal the factor structure. Everything you need for this factor structure is already in a covariance matrix and so the means/ an intercept is not needed.
What are the population parameters of an exploratory factor analysis?
π|^2,π1| is the variance of factor 1, π1π
π|^2,π2| is the variance of factor 2, π2π
etc.
ππ1π2 is the covariance between factor 1 and 2
etc.
What are the two most popular methods of parameter estimation in factor analysis?
Principal factoring
Maximum likelihood
What does principal factoring consist of?
- Based on the Eigenvalues of the principal factors.
- Same tools as in principal component analysis (see IRT lecture)
- Kaiser criterion
- Scree plot
- Parallel analysis
Give two advantages of principal factoring
No distributional assumptions
No improper solutions (e.g., negative variances)
Give a disadvantage of principal factoring
No explicit falsifiable model
-You can calculate all you parameters yet still not know whether your factors are fitting well or not
What does maximum likelihood consist of in factor analysis?
As discussed in IRT lecture, but with a normal distribution for the data
Give two advantages of using maximum likelihood as paremeter estimation for EFA and two disadvantages
+ Explicit model based
+ Model falsification
- Sometimes improper solutions (e.g., negative variances etc)
- Multivariate normal distribution assumption for the data
What is meant by saying you can obtain improper solutions using MLE to estimate the parameters?
e.g: Sometimes with MLE you estimate your parameters and get a negative residual or common factor variance which shouldnβt be possible
How can you falsify your model with MLE? (2)
You can falsify your model with means of fit measures, There are many but two of which are:
- π2-goodness of fit measure
- Root Mean Squared Error of Approximation
What is involved in the π2-goodness of fit measure?
It just gives a significance test with the following hypotheses:
β’ H0: Model fits
β’ HA: Model does not fit
So in this case you want your test to be insignificant
What is involved in the Root Mean Squared Error of Approximation?
You take a chi-square of the model, subtract the df associated with the model. Then divide that by the square root of the df - (the number of participants -1):
π
πππΈπ΄= sqrt( π2βππ ) / sqrt(df *(πβ1))
if π2 < ππβπ
πππΈπ΄ =0
How do you interpret the result of a RMSEA?
You can readily interpret the number produced with the following:
β’ <0.05 good fit
β’ 0.050.08 poor fit
What two identification issues are associated with factor analysis?
Scaling the latent variable: Similar as in the one factor model and in IRT, the scale of the latent variables (factors) needs to be identified
Statistical identification: The number of parameters should not exceed the number of observed (co)variances
What is involved in scaling the latent variable?
For EFA this is relatively complex, it is carried out but we donβt need to know exactly how. Most important thing to know is that this constraint results in M^2 restrictions (the number of restrictions you impose on the model to make the latent variable have a scale/ unit.)
Why wasnβt statistical identification an issue with the earlier models (IRT)?
Before we were only considering simpler, unidimensional models which hardly have a problem with statistical identification. But now since weβre going to build bigger more complex models, at some point your model is going to become too big for the data e.g using a model with 10 items and 1,000 parameters on a dataset of 50 people would be trying to extract more from your data than you put into the model.
What is involved in statistical identification for EFA?
β’ The number of parameters should not exceed the number of observed (co)variances
β’ In EFA this can happen if the number of factors in the model is too large
-The observed covariances contain all the information about the factor structure so otherwise the factor structure would be too complex for the information we provide
How can you investigate whether a model is identified according to statistical identification in regards to EFA?
A model should always have degrees of freedom larger than or equal to 0
ππ = πβ π
π:number of independent pieces of observed information
π:number of parameters
In this formula for df:
ππ = πβ π
where
π:number of independent pieces of observed information
What does M mean in terms of an EFA?
The number of observed covariances and variances (since that contains all the information about the factor structure.)
How do you calculate M for the df of an EFA if the EFA is conducted on a covariance matrix?
M = p * (p + 1)/2
where p = number of observed variables
E.g.,: π=4 βπ =4 β 5/2 = 10
This makes sense if you count the covariances and variances of a covariance matrix
In this formula for df: ππ = πβ π where π:number of independent pieces of observed information π:number of parameters
How do you get the number of parameters?
π = πΓπ + πΓ(π+1)/2 + πβ π^2 (formula from in exam)
π is number of variables
π is number of factors
E.g,: π=6 and π=2β 6β2 + (2β3)/2 + 6 β 22 = ππ
πΓπ is the number of loadings (relations between each factor and each varaible/item) = 12
πΓ(π+1)/2 are the factor covariances = 3
π residual covariances = 6
π^2 covariance constraints = 4
What is the role of m^2 in this formula?
π = πΓπ + πΓ(π+1)/2 + πβ π^2
This is what resulted from scaling the latent variable, these parameters are no longer free parameters. They are fixed in order to identify the scale of the model and so are subtracted from the residual variances.
These formula are for models fit on ________
Strictly, these formula are for models fit on a covariance matrices, but you can also use them for correlation
matrices (as for df it does not matter)
Raw factor loadings in EFA are hard to interpret. Why is this and what solution is there to this?
Raw factor loadings in EFA are hard to interpret because values are given for each factor and it can be unclear which factor the variable loads on. To fix this, a simple structure version was created in which the values are replaced by + and - to indicate whether a variable loaded on a factor or not through rotation.
What is rotation?
A transformation of the raw factor loadings to enhance simple structure
What allows us to carry out these rotations on the factor loadings?
The unit of the factors is arbitrary, we chose them! You can therefore transform the results without affecting your model statistically. We can change the scale a little bit after we fit the model just to see when the interpretation is the best.
Name two types of rotation and a key feature of both
Orthogonal rotation:
β’ The factors remain uncorrelated
Oblique rotation:
β’ The factors are correlated after rotation
Name the main R functions associated with each type of rotation
Orthogonal rotation:
- varimax in R
Oblique rotation:
- promax in R
In psychology, what type of rotation is typically used and why?
In psychology we can mostly assume factors are correlated, therefore oblique rotation is typically used (promax). Its hard to argue for using uncorrelated factors in psychology
Write a line of code to carry out a factor analysis on the correlation matrix βosbourne_corβ. Describe what the arguments do
fa(r = osbourne_cor, nfactor = 1, n.obs = 477, fm = 'ml', rotate = 'none') nfactor = 1 fits a one factor model fm = 'ml' requests the maximum likelihood rotate = 'none' requests no rotation (never a good idea)
fa(r = osbourne_cor, nfactor = 2, n.obs = 477, fm = βmlβ, rotate = βvarimaxβ)
Fits a two factor model with orthogonal rotation in R, doesnβt really make sense since it assumes the factors are uncorrelated
you can then tweak the nfactors and observe the RMSEA to see which is the best fit
fa(r = osbourne_cor, nfactor = 2, n.obs = 477, fm = βmlβ, rotate = βpromaxβ)
Fits a two factor model with oblique rotation in R, makes the most sense