3 introduction to exploratory factor analysis Flashcards
What is an exploratory factor analysis?
statistical technique used to uncover the underlying structure of a relatively large set of variables.
identify the number of potential latent factors that can explain the patterns of correlations among the observed variables
What are four general issues regarding the best practice EFA?
component vs. factor extraction
number of factors to retain for rotation
orthogonal vs. oblique rotation
adequate sample size
What is the best extraction method for an EFA?
PCA is default extraction method
factor analysis is preferrable
PCA - data reduction method, no regard to latent variables
researchers rarely collect and analyse data without an a priori idea about how the variables are related
FA - shared variance of a variable is partitioned from its unique+error variance of a variable to reveal the underlying factor structure
SPSS - six Factor Extraction Methods
(unweighted least squares, generalized least squares, maximum likelihood, principal axis factoring, alpha factoring, and image factoring)
if relatively normally distributed then maximum likelihood is good choice
How many factors should be retained for rotation?
retain all factors with eigenvalues greater than1.0
→ least accurate method for selecting number of factors to retain
scree test
(examining the graph of the eigenvalues and looking for natural bend or break point in the data where the curve flattens out)
number of datapoints above the break is usually the number of factors to retain
What rotation technique should be used in an EFA?
implify and clarify the data structure
cannot improve the basic aspects of the analysis
varimax rotation, quartimax and equamax are orthogonal methods
direct oblimin, quartimin, and promax are oblique (allow for factor correlation)
oblique is generally more appropriate as a certain amount of intercorrelation is always expected in social sciences
SPSS - rotated factor matrix, pattern matrix, factor correlation
manipulation of delta or kappa values allows for factor correlation
appears to introduce unecessary complexity tho
What is the best sample size for an EFA?
research uses relatively small samples (measured by subject to item ratio)
no strict rules
partly determined by nature of data
the stronger the data, the smaller the sample can be for an accurate analysis. “Strong data” in factor analysis means uniformly high communalities without cross loadings, plus several variables loading strongly on each factor.
What are good results of an EFA?
item communialities are considered high if they are all .8 or greater
unlikely to occur (more like 0.4-0.7)
.32 as minimum loading of an item, approx 10% of overlapping variance with other items
crossloading = item that loads higher on two or more factors
a factor with fewer than three items is generally weak and unstable
5 or more strongly loading items (.50 or better) are desired
What did the study examining best practice EFA find?
-
Extraction Method: PCA resulted in significanlty higher total variance, item loadings were higher (doesnt partition unique variance from shared variance)
factor analysis produced an average variance of 59.8% and principal component analysis produced 69.6% - overstimation -
Rotation Method: orthogonal factors
oblique rotation is recommended -
Sample Size: larger samples tended to produce solutions that were more accurate
20% in 2:1 produced correct solutions
70% in 20:1
number of misclassified items was significantly affected by sample size
almost two of thirteen items were misclassified in smallest samples, over one item in every two analyses in largest sample
⇒ the norm: PCA with varimax and kaiser criterion
⇒ optimal: maximum likelihood, oblique rotation and scree plot plus multiple test runs for information on how meaningful factors might be in dataset
⇒ large samples, even with 20:1 theres error rates above the .05 level
What does the chi-square test say in an EFA?
- As with CFA, the chi-square test of EFA effectively compares the model you have defined against the ‘ideal’ model for the data. As a result, you want the chi-square test to be non-significant (p>.05) as this will allow us to infer that our predefined model is not significantly different from the ideal model of our data.
- Note: as the amount of data increase, chi-square is increasingly more likely to be significant and, therefore, any interpretation of the model must be based on more than the chi-square result, especially when the dataset is large.
What is the role of eigenvalues in an EFA?
greater than 1
-> explain significant amount of the variance present in the data
How should the scree plot technique be used?
The scree plot should be used in conjunction with the eigenvalues to determine how many factors to retain. The number of factors to retain is based on the number of factors that are above the ‘break’ (sometimes referred to as the ‘elbow’) in the scree plot. Do not include the factor at the break; only count those before the break/elbow.
What can be said about factor loadings?
- As with CFA, the strength of the relationship between the observed and the latent factor variables is expressed in terms of a factor loading regression coefficient.
- An observed variable is said to ‘load sufficiently’ onto a factor, that is, it has a strong enough relationship to be considered a true part of the factor, if the factor loading is more extreme than +/- 0.4.
What is the common factor model by Thurstone (1947)?
factors influence observed variables, accounting for their variation and covariation
factor analysis = fitting models to bivariate associations among unobserved variables
linear regression model
Y = An + e
Y = observed variable (item)
An = P x M matrix (factor loadings, output, common factors)
e = unique factor
What is the primary computational challenge in an FA?
regression coefficients
variables are latent/unobserved
however, analysis of covariates among Y
-> reverse-engineering
interrelatedness leads to inference about underlying factor structure
What is the statistical distinction between EFA and CFA?
EFA - free
factor loading matrix are freely estimated
M is estimated relation
no a prior hypothesis about M
CFA - resticted
factor loading matrix is fixed to equal zero
no crossloadings
independent clusters solution
rotation is not a thing
strong a prior hypothesis about M