Factor Analysis Flashcards
Overview
- What is factor analysis?
- Factor loadings, eigenvalues and communalities
- Extracting/ choosing factors
- Interpreting factor loadings
- Rotation of factors
- Sequence of operations to conduct a factor analysis
- Reporting a factor analysis
Learning objectives
- Understand and be able to explain… the main aim of factor analysis.
- Understand and be able to explain… factors loadings, Eigen values and communalities.
- Understand and be able to explain… the different criteria for extracting factors.
- Understand and be able to explain… when and why to rotate factors.
- Be able to… conduct, report and interpret a factor analysis.
What is factor analysis?
The overall aim of factor analysis is to analyse patterns of correlations between variables (items) in order to reduce these variables to a smaller set of underlying constructs called “factors” or “components”
The factors are informative in their own right, but also provide a new set of scores which might be employed in another multivariate analysis such as multiple regression
Can distinguish between exploratory and confirmatory factor analysis
Items and factors
We might have a series of items that people respond to, and we’re saying that variation is caused by some underlying construct/ latent variable
There can be several factors influencing variation in the different item responses
Instead of analysing all 14 items we condense it down to 2 factors (each representing a psychological construct) - makes additional analysis better
Different types of factor analysis
When looking at exploring factor analysis, ie. exploring the data to see how many underlying factors there are, there are several kinds of analysis.
We are looking at exploratory factor analysis, within that the method of Principle Components Analysis (PCA)
How do we analyse patterns of correlations?
Considering a scale with 14 items… this results in 91 correlations to examine (30 = 423)
If several of the correlations are >.3 then this suggests that there are a smaller number of underlying factors than 14 (or 30) different constructs
So…
- first step is to look at correlations
- looking for patterns of correlations between the items
- when you have lots of items, this cannot be done manually
- if several of the correlations between items are above .3 then this is taken to suggest that there are a smaller number of underlying factors/ constructs than the 14 represented in the items
Exploratory factor analysis (EFA)
Typically used to identify a smaller number of underlying factors (components) when analysing a large number of items within a scale
For example, with a 14 item scale different factors for depression and anxiety might emerge
Ideas underlying the PCA (principle components analysis) form of EFA (exploratory factor analysis)
- Components are linear combinations of variables
A “component” is a linear combination of the variables - something that effects how “as one variable changes, so does the other”
The aim is to construct a linear combination (V) of each participant’s scores on the variables (items) with the coefficients (a1 etc) chosen so as to maximise the proportion of total variance accounted for by this factor (component).
For example, for 3 variables/item scores (Y1, Y2, Y3)…
A component score (V), for each participant is obtained from the sum of all his/her scores, where each score is multiplied by a different coefficient (all participants’ scores are multiplied by the same set of coefficients):
V = a1 Y1 + a2 Y2 + a3 Y3 So...
PCA is trying to create an equation (including scores and coefficients) to explain the max variance accounted for by the factor
Higher scores = more effect of any possible factor, higher correlation = relates to other items
Ideas underlying the PCA (principle components analysis) form of EFA (exploratory factor analysis)
- More than one component is possible
More than one component is possible. The first component employs coefficients to account for the maximum amount of the total variance:
V = a11 Y1 + a21 Y2 + a31 Y3
A second component has the same aim, but is constrained to be uncorrelated with the first. There is a second set of coefficients, and so on:
V = a11 Y1 + a21 Y2 + a31 Y3
For each component, the coefficients are chosen to account for the maximum amount of variance remaining.
In theory, the total number of components = number of items.
The more all the original variables are correlated together, the more the total variance will be accounted for by the first component.
Rewording:
- One component will leave some variance in scores unexplained so we create another component (with another set of coefficients) that seeks to explain what’s left of the variance in the item scores
- you can create components up to the number of items, then all variance is explained, but we want as few as possible
- when the original variables (items) are highly correlated together, then more variance is accounted for by the first component
Ideas underlying the PCA (principle components analysis) form of EFA (exploratory factor analysis)
Loadings, Eigenvalues and Communalities
- Factor loading
- each loading is the correlation between a variable (item) and a factor
- tells you how much each item correlates with a factor/component - Eigenvalues
- ∑L2
- tells you the amount of variance accounted for by a factor - Communality
- ∑L2
- sum of squares factor loadings for all factors in a variable (added up the correlations between the variable and the factor) for one item
SEE DIAGRAM PAGE 3
Factor loadings
Each loading is the correlation between a variable (item) and a factor
- tells you how much each item correlates with a factor/component
Note: loading2 = proportion of variance in a given variable accounted for by a factor (e.g. .322 = 0.1 or 10%.)
Absolute loadings > .30 (or 0.32) are called ‘salient’ and interpreted
Absolute loadings < .30 are dismissed, & sometimes written as zero
We don’t ask, “Is the correlation (i.e. the loading) significant?” However, loadings of .70 and .55 are deemed ‘excellent’ and ‘good’, respectively, accounting for 50% and 30% of the variable’s variance.
So… so the more an item and a factor are correlated, the bigger the loading.
L2 gives the proportion of the variance in a given variable/ item accounted for by a factor
Eigenvalues
Each eigenvalue = the sum of the squared loadings within a factor/ component down the whole set of variables/ items
Each eigenvalue is the amount of variance in the set of variables/ items accounted for by a particular factor
Each eigenvalue = the variance of the V (linear combination) values for that factor
Eigenvalues range form 0 to the total number of items
An eignenvalue > 1.00 suggests that this factor should be selected - i.e., it is a “principal component”
The percentage of the total variance accounted for by one (or more) factors is given by:
P = sum of selected eigenvalues x 100
___________________________
number of items
∑L2 = eigenvalue, sum of squared loadings within a factor for all the items. It’s the amount of variance in the whole set of items accounted for by a factor. Ranges from 0-no. of items, want it to be bigger than 1, so we have less components than items (smaller number of underlying factors)
Communalities
Each communality = sum of the squared loadings within a variable, across the selected factors
Each communality is the proportion of variance in an observed variable accounted for by the selected factors
If as many factors are selected as there are variables, each communality will = 1.00 in PCA
A communality < .30 suggests that the variable is unreliable and should be removed (as the factors account for less than 30% of its variance)
Table of communalities:
- Based on how many factors have been extracted, so its saying that the extracted factors (e.g. 2) are accounting for so much variance in an item e.g. .6/ 60%
- If all items are above .3 the 2 factors are doing a decent job of explain the variance in each individual item
How many factors or components should be “extracted”?
We use the eigenvalues (amount of variance in the set of variables/ items accounted for by a particular factor).
Can then use:
- Kaiser’s criterion
- Cattel’s scree test
Deciding how many factors to extract
Kaiser’s Criterion
SPSS does this automatically, orders factors in order of what accounts for most variance - extract anything over 1