3 introduction to exploratory factor analysis Flashcards
What is an exploratory factor analysis?
statistical technique used to uncover the underlying structure of a relatively large set of variables.
identify the number of potential latent factors that can explain the patterns of correlations among the observed variables
What are four general issues regarding the best practice EFA?
component vs. factor extraction
number of factors to retain for rotation
orthogonal vs. oblique rotation
adequate sample size
What is the best extraction method for an EFA?
PCA is default extraction method
factor analysis is preferrable
PCA - data reduction method, no regard to latent variables
researchers rarely collect and analyse data without an a priori idea about how the variables are related
FA - shared variance of a variable is partitioned from its unique+error variance of a variable to reveal the underlying factor structure
SPSS - six Factor Extraction Methods
(unweighted least squares, generalized least squares, maximum likelihood, principal axis factoring, alpha factoring, and image factoring)
if relatively normally distributed then maximum likelihood is good choice
How many factors should be retained for rotation?
retain all factors with eigenvalues greater than1.0
→ least accurate method for selecting number of factors to retain
scree test
(examining the graph of the eigenvalues and looking for natural bend or break point in the data where the curve flattens out)
number of datapoints above the break is usually the number of factors to retain
What rotation technique should be used in an EFA?
implify and clarify the data structure
cannot improve the basic aspects of the analysis
varimax rotation, quartimax and equamax are orthogonal methods
direct oblimin, quartimin, and promax are oblique (allow for factor correlation)
oblique is generally more appropriate as a certain amount of intercorrelation is always expected in social sciences
SPSS - rotated factor matrix, pattern matrix, factor correlation
manipulation of delta or kappa values allows for factor correlation
appears to introduce unecessary complexity tho
What is the best sample size for an EFA?
research uses relatively small samples (measured by subject to item ratio)
no strict rules
partly determined by nature of data
the stronger the data, the smaller the sample can be for an accurate analysis. “Strong data” in factor analysis means uniformly high communalities without cross loadings, plus several variables loading strongly on each factor.
What are good results of an EFA?
item communialities are considered high if they are all .8 or greater
unlikely to occur (more like 0.4-0.7)
.32 as minimum loading of an item, approx 10% of overlapping variance with other items
crossloading = item that loads higher on two or more factors
a factor with fewer than three items is generally weak and unstable
5 or more strongly loading items (.50 or better) are desired
What did the study examining best practice EFA find?
-
Extraction Method: PCA resulted in significanlty higher total variance, item loadings were higher (doesnt partition unique variance from shared variance)
factor analysis produced an average variance of 59.8% and principal component analysis produced 69.6% - overstimation -
Rotation Method: orthogonal factors
oblique rotation is recommended -
Sample Size: larger samples tended to produce solutions that were more accurate
20% in 2:1 produced correct solutions
70% in 20:1
number of misclassified items was significantly affected by sample size
almost two of thirteen items were misclassified in smallest samples, over one item in every two analyses in largest sample
⇒ the norm: PCA with varimax and kaiser criterion
⇒ optimal: maximum likelihood, oblique rotation and scree plot plus multiple test runs for information on how meaningful factors might be in dataset
⇒ large samples, even with 20:1 theres error rates above the .05 level
What does the chi-square test say in an EFA?
- As with CFA, the chi-square test of EFA effectively compares the model you have defined against the ‘ideal’ model for the data. As a result, you want the chi-square test to be non-significant (p>.05) as this will allow us to infer that our predefined model is not significantly different from the ideal model of our data.
- Note: as the amount of data increase, chi-square is increasingly more likely to be significant and, therefore, any interpretation of the model must be based on more than the chi-square result, especially when the dataset is large.
What is the role of eigenvalues in an EFA?
greater than 1
-> explain significant amount of the variance present in the data
How should the scree plot technique be used?
The scree plot should be used in conjunction with the eigenvalues to determine how many factors to retain. The number of factors to retain is based on the number of factors that are above the ‘break’ (sometimes referred to as the ‘elbow’) in the scree plot. Do not include the factor at the break; only count those before the break/elbow.
What can be said about factor loadings?
- As with CFA, the strength of the relationship between the observed and the latent factor variables is expressed in terms of a factor loading regression coefficient.
- An observed variable is said to ‘load sufficiently’ onto a factor, that is, it has a strong enough relationship to be considered a true part of the factor, if the factor loading is more extreme than +/- 0.4.
What is the common factor model by Thurstone (1947)?
factors influence observed variables, accounting for their variation and covariation
factor analysis = fitting models to bivariate associations among unobserved variables
linear regression model
Y = An + e
Y = observed variable (item)
An = P x M matrix (factor loadings, output, common factors)
e = unique factor
What is the primary computational challenge in an FA?
regression coefficients
variables are latent/unobserved
however, analysis of covariates among Y
-> reverse-engineering
interrelatedness leads to inference about underlying factor structure
What is the statistical distinction between EFA and CFA?
EFA - free
factor loading matrix are freely estimated
M is estimated relation
no a prior hypothesis about M
CFA - resticted
factor loading matrix is fixed to equal zero
no crossloadings
independent clusters solution
rotation is not a thing
strong a prior hypothesis about M
What is rotational indeterminancy?
there is an infinite number of factor patterns (rotations) that fit the observed data equally well
What is row parsimony and column parsimony?
row parsimony
(the extent to which each variable loads strongly on
one factor and near-zero on other factors) and
column parsimony
(the extent to which each factor has large loadings for some
variables and small loadings for others);
How are the factors analysed?
FA frequently involves analysis of individual items from a single test or questionnaire rather than analysis of total scores from multiple tests or subscales
items - binary or ordered categorical variables
Just as a nonlinear procedure such as logistic regression is preferable to ordinary linear regression for a categorical outcome, a categorical variable methodology is often preferable for factor analyses of items.
The potential consequences of factor analysing items as continuous variables include incorrect conclusions about the number of factors (or model fit more generally), biased factor loadings and interfactor correlations, and biased parameter standard errors, leading to incorrect significance tests and confidence intervals.
What are polychoric correlations?
used to detect correlation with underlying continuous variables that are normally distributed
estimates correlation coefficients
How should FA be applied to psychological research?
psychometric development and validation of individual scales
long history of contributing to theories of intelligence and structures of personality
When should FA be used?
if validity evidence for new scales is needed
internal structure (the dimensionality) of the item is consistent with expectations regarding the constructs that the scale is intended to measure
EFA is preferrable in the beginning cus there might be unanticipated but meaningful factors influencing items (cross-loadings)
Why is FA not necessarily providing evidence for construct validity?
use of factor analysis is often presented as providing evidence for:
construct validity
substantive = theory of construct, developing a pool of test items which adresses all theoretical aspects
structural = items´ empirical relations are consistent with theoretical aspects
external = related to other constructs? measures, outcomes, …
factor analysis is squarely in structural phase of construct validity
does not give sufficient information for removing an item
removing an item might be harmful to content validity
influenced by sampling error
Should I do a FA in my study?
cale validation is ongoing process, all empirical studies using a given scale contribute evidence for the validity of the use of that scale.
criterion validity is provided if a new study reports an association between the scale and an important outcome variable
differences across populations (meaning of significance)
reevaluate properties of a scale
→ measurement invariance?
How can you determine if you should use a CFA or an EFA?
depends on the particularities of the research scenario
rooted in common factor model
Bown (2015)
CFA requires a strong empirical or conceptual foundation to guide the specification and evaluation of a factor model. Accordingly, CFA is typically used in later phases of scale development or construct validation after the underlying structure has been tentatively established by prior empirical analyses using EFA, as well as on theoretical grounds. (p. 41)
EFA:
Determine the nature of and the number of latent variables that account for observed variation and covariation between a set of observed indicators, without an a priori model
Latent factors are interpreted as the causes of observed values
Requires a structured covariance matrix
How would one proceed in no CFA modeladequately fits the data?
researchers may conclude that the hypotheses or expectations behind the model were not supported, acknowledging the nature of science.
-> move to exploratory mode
-> revising CFA
(problematic, needs strong rationale)
For instance, if a one-factor model is rejected due to positively and negatively worded items, researchers may consider freeing error covariances among negatively worded items for improved fit.
-> stick to traditional EFA
What are the main points when replicating an EFA with an CFA?
do not use the same dataset
new decisions need to be made about method
CFA might not fit adequately
return to EFA
What are common mistakes with EFA?
- mistaking PCA for FA
- using kaiser criterion
- not using oblique
- neglecting categorical nature of items
- naming fallacy
- defaults in popular software
- overcoming those with alternative software
What are common mistakes with CFA?
- misunderstanding model fit
- limitations of norm statistics
- multiple well-fitting models
- overrreliance on good fit
- problems with model revisions
- exploratory nature of post hoc model modification
How would an EFA be conducted in JASP?
- no need to name variables
- move all variables into variables box
- eigenvalues
above 0 - rotation
oblique - oblimin
principal axis factoring - output options
scree plot - chi-square test → nonsignificant
- scree plot → where is the elbow
- remove cross-loadings or the removed factor items