Factor Analysis Flashcards
What is factor analysis
a broad term surrounding a family of techniques that investigates clusters of variables - determines whether a larger number of variables can be reduced to a smaller number of variables (factors) by grouping together variables that are highly intercorrelated, while leaving out uncorrelated variables
3 types of factor analysis
- Principle Components Analysis
- Exploratory Factor Analysis
- Confirmatory Factor Analysis
Differences between EFA and CFA
EFA:
- no pre-defined number of factors
- no pre-determined variable/factor relationships
- more common
- typically done via factor analysis (FA)
CFA
- pre-defined number of factors
- pre-determined variable/factor relationships
- less common
what are the features of EFA
- used to determine appropriate scale/questions by identifying items that co-vary and load onto the construct which therefore comprise a factor
- Sometimes used in determine discriminate validity but not very robust in that regard
observed correlation matrix
correlation matrix produced by the observed variables
reproduced correlation matrix
correlation matrix produced from factors
residual correlation matrix
difference between observed and reproduced correlation matrix. In good FA, correlations in resid matrix are small thus indicating a close fit between observed and reproduced matrix
factor rotation
process by which the solution is made more interpretable without changing its underlying mathematical properties
orthogonal rotation
all factors are uncorrelated with each other
loading matrix
matrix of correlations between observed variables and factors. size of the loadings represent the relationships between each observed variable and each factor
oblique rotation
factors themselves are correlated
How can factor scores be combined
- Weighted average - rarely used as too simplistic
- Regression methods - more sophisticated but limits are imposed on way scores can relate to each other
- Bartlett method - overcomes limitations of regression method by producing unbiased scores
- Anderson-Rubin method - modification of Bartlett method that produces uncorrelated and standardised factor scores (recommended by Tabachnick & Fidell) if uncorrelated scores are required
When to use factor analysis
- To understand the structure of a set of variables - e.g. personality, mood, anxiety, culture, grief, well-being, and intelligence
- To develop a questionnaire to measure a variable - to ensure that items themselves are in fact measuring what they say they are measuring
- To reduce a data set to a more manageable size while retaining the data set’s essential qualities - reduces a large number of variables to a smaller number of factors which are then used in further analysis. Mean that instead of using a larger number of potentially related variables in a regression, can use a smaller number of more targeted variables and thus improving strength of the analysis
When to use PCA or EFA
- PCA most commonly implemented in terms of scale development - default in SPSS
- Costello and Osborne indicate that its in fact a data reduction technique and not conducive to extracting factors from a particular dataset
- if you want to summarise a number of items, use PCA - PCA gives each item the same latent weight, therefore EFA much more robust in that regard (don’t predict latent variables to the same degree)
Problems with PCA and EFA
- there are no external criterion such as group membership against which to test the solution
- after extraction, there is an infinite number of rotations available - all accounting for the same amount of variance in the original data, but with factors defined slightly differently
- FA is frequently used in an attempt to “save” poorly conceived research
When not to do FA
Five key decisions according to Fabringer et al:
- study design, particularly what variables are to be measured. Researchers need to consider the nature and number of common factors they wish to examine and ensure that such factors are represented in multiple measurements
- determining whether EFA is appropriate
- choice of model fitting procedure, specifically which factor extraction procedure is to be undertaken
- number of factors - balancing parsimony with ability of the model to account for correlations between models (plausibility of model)
- rotation method - specifically whether the researcher will allow for correlations between factors
what does Field recommend to use as a cut-off point for multicollinearity
.3
Additional tests for FA
Kaiser-Meyer test - covariance between items
- score close to 0 equals little covariance and therefore EFA not appropriate
- score closer to 1 suggests strong degree of common variance → Hair et al recommend that a score of .8 to .9 is excellent
Bartlett’s test - correlation between items
- needs to be significant in order to proceed with EFA. significance means large degree of overlap amongst items
- measured using chi square rather than pearson’s correlation
what is the proportion of common variance called
communality
what is the purpose of extraction and how is it done
After factors have been discovered, decision have to made about how many and which factors to keep, this is called extraction - one method of extracting is with EIGENVALUES (represents the proportion of variance accounted for by a factor)
→ higher the eigenvalue, greater the proportion of explained variance
What is an alternative way to determine which factors to keep in analysis
Kaiser (1960) recommended retaining all factors with eigenvalues greater than one because one represents a substantial amount of variance as explained. Others have suggested this is too rigid and can over-extract. recommend retaining all factors with eigenvalues greater than 0.6 or 0.7.
What is the monte carlo parallel analysis
- most robust method of determining factors, but very under-utilised as it requires syntax
- compares observed eigenvalues taken from correlation matrix with eigenvalues extracted from the simulation of a number of parallel datasets
- determines the expected eigenvalues by averaging the resulting randomly generated eigenvalues for each factor
- any eigenvalue in the original dataset that exceeds those randomly generated is considered significant
What is the purpose of rotation
to maximise high correlations between factors and variables and to minimise low correlations. Rotation makes it easier to accurately discriminate between factors, thus improving the interpretability and scientific utility of your model. 2 types:
→ Orthogonal (unrelated, perpendicular) rotation vs oblique rotation
what is orthogonal rotation
easier to interpret, describe, and report results; more suitable if factors are almost independent - costello and osborne suggest it is counter-intuitive because rarely are two factors uncorrelated (since both are predicting same factor)
oblique rotation
more suitable if factors are correlated.
different types of oblique rotation
→ direct quartimin: direct rotation of data - recommended by Howard unless you have solid reason determining degree of relationship between factors
→ direct oblimin: same as quartimin but includes extra parameter that allows to control for degree of obliqueness
→ promax: uses orthogonal techniques as far as mathematically possible where it rotates obliquely
Field (2018) recommends that atfirst factor analysis, start with the orthogonal technique of varimax rotation (varimax meaning variance maximising) to simplify the interpretation of factors.
inter-rater reliability
decisions from different raters are compared to each other to see how consistent the rater’s decisions are
test-retest reliability
same survey given to same group of people at different times
parallel forms
when same group of people complete two similar versions of the survey. each version of the survey is trying to measure the same thing. then, the results from the two versions are compared in order to determine the consistency of the results between the similar versions of the survey
internal consistency
when different survey items that are trying to measure same construct are compared to see how they produce similar results. 2 types:
→ typically measured through alpha but research now suggesting to use omega (uses regression weights and error of items in order to calculate internal consistency)
average inter-item correlation
when group of people complete a survey and then all items on the survey that they are measuring the same construct are compared with each other. then the items are compared overall to create an average of those comparisons
split half correlation
when group of people complete survey then all items on survey that are measuring same construct are split in half to form two sets of items. then the two sets of items are compared to each other to see if they all consistently measure the construct
face validity
when researchers simply look at the items on the survey and give their opinion if the items appears to accurately measure what they are trying to measure
content validity
how well a survey covers the range of meanings included within a concept that is being measured
construct validity
when we are able to generalise out construct of interest (how truthful we are labelling out construct)
2 types of construct validity
- convergent: demonstrated through having strong (positive or negative) correlations with very similar or dissimilar variables
- divergent: indicated through no associations with variables presumed to be completely unrelated with the construct
criterion validity
when results from survey accurately relate to some kind of external variable
- predictive: where scale is able to predict future performance
- concurrent: where scale strongly correlates with other scales purporting to measure the same construct