Factor Analysis Flashcards
What is the difference between Principle component analysis (PCA) and factor analysis
PCA as data compression asks about using the data to find a succinct way of describing the data.
In factor analysis, conceptually we ask what (small number of) processes could give rise to the data
In factor analysis, how do DVs relate to a factor
Use the correlation btw the scores of a DV and the factor scores, known as the loading on the factor
In the ‘component matrix’ of the PASW output
The larger the correlation of a factor to a DV, the closer the factor is related to the DV (or the closer the DV is related to the factor)
How to perform factor analysis
look at scree plot (plot eigenvalue against component)–>count the number of eigenvalues (e.g., 2) that are larger then one–> look to see if there 2 factors–> rotate the components so that the loading were as different from each other as possible (e.g., varimax rotation, rotate so that for a given factor, items load either high or low)–>pick the higher loading for each item, label the factors, add the scores from the DVs linked to the latent variable and ignore DVs that do not contribute much, and then try to find the relationship between latent variables
what are the prons and cons of PCA
Pros: Is a good way of describing data with small number of variables, operates in a principled way: maximise the variability explained, make sure the components are orthogonal,
know how much of the information in the data set is captured by a ‘solution’
Cons
not so easy to know the best ‘solution’
How many components to use is a trade off
what are the auumptions of PCA
multivariate normality
linear (streaight line) relationship btw the measures
how to interpret PCA
First, how much of the variability of the original data have we captured
Overall: the sum of the eigenvalues included in the ‘solution’
Want to know about the individual DVs
how much of the information (variance) of a particular DV is being used in the components in the ‘solution’
Called the ‘communality’
If the calue is low the variable might not be that useful