Factor Analysis Flashcards
What is factor analysis
a broad term surrounding a family of techniques that investigates clusters of variables - determines whether a larger number of variables can be reduced to a smaller number of variables (factors) by grouping together variables that are highly intercorrelated, while leaving out uncorrelated variables
3 types of factor analysis
- Principle Components Analysis
- Exploratory Factor Analysis
- Confirmatory Factor Analysis
Differences between EFA and CFA
EFA:
- no pre-defined number of factors
- no pre-determined variable/factor relationships
- more common
- typically done via factor analysis (FA)
CFA
- pre-defined number of factors
- pre-determined variable/factor relationships
- less common
what are the features of EFA
- used to determine appropriate scale/questions by identifying items that co-vary and load onto the construct which therefore comprise a factor
- Sometimes used in determine discriminate validity but not very robust in that regard
observed correlation matrix
correlation matrix produced by the observed variables
reproduced correlation matrix
correlation matrix produced from factors
residual correlation matrix
difference between observed and reproduced correlation matrix. In good FA, correlations in resid matrix are small thus indicating a close fit between observed and reproduced matrix
factor rotation
process by which the solution is made more interpretable without changing its underlying mathematical properties
orthogonal rotation
all factors are uncorrelated with each other
loading matrix
matrix of correlations between observed variables and factors. size of the loadings represent the relationships between each observed variable and each factor
oblique rotation
factors themselves are correlated
How can factor scores be combined
- Weighted average - rarely used as too simplistic
- Regression methods - more sophisticated but limits are imposed on way scores can relate to each other
- Bartlett method - overcomes limitations of regression method by producing unbiased scores
- Anderson-Rubin method - modification of Bartlett method that produces uncorrelated and standardised factor scores (recommended by Tabachnick & Fidell) if uncorrelated scores are required
When to use factor analysis
- To understand the structure of a set of variables - e.g. personality, mood, anxiety, culture, grief, well-being, and intelligence
- To develop a questionnaire to measure a variable - to ensure that items themselves are in fact measuring what they say they are measuring
- To reduce a data set to a more manageable size while retaining the data set’s essential qualities - reduces a large number of variables to a smaller number of factors which are then used in further analysis. Mean that instead of using a larger number of potentially related variables in a regression, can use a smaller number of more targeted variables and thus improving strength of the analysis
When to use PCA or EFA
- PCA most commonly implemented in terms of scale development - default in SPSS
- Costello and Osborne indicate that its in fact a data reduction technique and not conducive to extracting factors from a particular dataset
- if you want to summarise a number of items, use PCA - PCA gives each item the same latent weight, therefore EFA much more robust in that regard (don’t predict latent variables to the same degree)
Problems with PCA and EFA
- there are no external criterion such as group membership against which to test the solution
- after extraction, there is an infinite number of rotations available - all accounting for the same amount of variance in the original data, but with factors defined slightly differently
- FA is frequently used in an attempt to “save” poorly conceived research
When not to do FA
Five key decisions according to Fabringer et al:
- study design, particularly what variables are to be measured. Researchers need to consider the nature and number of common factors they wish to examine and ensure that such factors are represented in multiple measurements
- determining whether EFA is appropriate
- choice of model fitting procedure, specifically which factor extraction procedure is to be undertaken
- number of factors - balancing parsimony with ability of the model to account for correlations between models (plausibility of model)
- rotation method - specifically whether the researcher will allow for correlations between factors
what does Field recommend to use as a cut-off point for multicollinearity
.3
Additional tests for FA
Kaiser-Meyer test - covariance between items
- score close to 0 equals little covariance and therefore EFA not appropriate
- score closer to 1 suggests strong degree of common variance → Hair et al recommend that a score of .8 to .9 is excellent
Bartlett’s test - correlation between items
- needs to be significant in order to proceed with EFA. significance means large degree of overlap amongst items
- measured using chi square rather than pearson’s correlation
what is the proportion of common variance called
communality
what is the purpose of extraction and how is it done
After factors have been discovered, decision have to made about how many and which factors to keep, this is called extraction - one method of extracting is with EIGENVALUES (represents the proportion of variance accounted for by a factor)
→ higher the eigenvalue, greater the proportion of explained variance
What is an alternative way to determine which factors to keep in analysis
Kaiser (1960) recommended retaining all factors with eigenvalues greater than one because one represents a substantial amount of variance as explained. Others have suggested this is too rigid and can over-extract. recommend retaining all factors with eigenvalues greater than 0.6 or 0.7.
What is the monte carlo parallel analysis
- most robust method of determining factors, but very under-utilised as it requires syntax
- compares observed eigenvalues taken from correlation matrix with eigenvalues extracted from the simulation of a number of parallel datasets
- determines the expected eigenvalues by averaging the resulting randomly generated eigenvalues for each factor
- any eigenvalue in the original dataset that exceeds those randomly generated is considered significant
What is the purpose of rotation
to maximise high correlations between factors and variables and to minimise low correlations. Rotation makes it easier to accurately discriminate between factors, thus improving the interpretability and scientific utility of your model. 2 types:
→ Orthogonal (unrelated, perpendicular) rotation vs oblique rotation
what is orthogonal rotation
easier to interpret, describe, and report results; more suitable if factors are almost independent - costello and osborne suggest it is counter-intuitive because rarely are two factors uncorrelated (since both are predicting same factor)