Factor Analysis Part 1 (wk 1) Flashcards
what is factor analysis?
- a statistical technique for identifying latent constructs underlying a group of variables
- used to categorise variables into groups based on common variance
- based on the correlational overlap between variables
when is factor analysis used?
- FA can be applied outside psychology but main use is for scale development and psychometric measure validation & design
- when we talk about variables –> ITEMS; when we talk about constructs –> TRAITS that we’re trying to assess
how do you which items assess which subscales?
through factor analysis!
AIM of FA: “to identify 1 or more underlying factors that are not directly observable but having casusative impact on item responses”
what different FA extraction techniques are there?
1) Maximum likelihood
2) Principal axis factoring (or principal factor analysis )
what does SPSS stand for?
Statistical Package for the Social Sciences
What is “r”?
- a package that statisticians use (Free) - that essentially codes all your statistics for your
- In statistics, the correlation coefficient r measures the strength and direction of a linear relationship between two variables on a scatterplot. The value of r is always between +1 and –1.
how do you run a factor analysis on SPSS?
1) Analyze > Dimension [Reduction > Factor
Shift variables (items you want to include in FA into variables box & ignore the selection variable box]
2) DESCRIPTIVES WINDOW
[tick determinants, KMO & Bartlett’s test of sphericity, and reproduced]
3) EXTRACTION WINDOW
[choose your extraction method - (a) maximum likelihood is most common EFA technique (b) principal axis factoring was 1st form of FA developed by Ketal - perfectly fine too!
- Maximum Likelihood is best as a starting point but see if principal axis factoring gives you a neater solution
- also “Maximum iterations for convergence” = means the number of shots SPSS will give before giving up on your data set (“nope, we can’t get any factors out of this data, try another”)]
4) ROTATION WINDOW
[choose your rotation technique
- varimax (orthogonal) most commonly used
- promax (oblique) most useful
- direct oblimin (older version of promax
- you want to try either varimax or promax as your starting point!
- set the convergence limit too (number of attempts before giving up)]
5) SCORES WINDOW
[though this sounds useful, nothing needs to be ticked within this box!]
6) OPTIONS WINDOW
[you can decide how to treat missing values
- select “exclude cases listwise (if someone is missing one of the variables = they’re out)
- select “suppress small coefficients (absolute value below .30)” = useful for visualising simple structure but turn off for your final table
7) HIT OK & GO THROUGH WITH FA OUTPUT
how many communalities are there?
two
1 - initial comunalities
2 - extraction communalities
define each of the communalities.
1) INITIAL COMMUNALITIES
[= are the proportion of variance in each item accounted for by the rest of the items
- represents the overlap between an amount & another item (as a collective)]
2) EXTRACTION COMMUNALITIES
[= are the proportion of variance in each item accounted for by the retained factors generated by the factor solution]
what’s the Good rule to go by for extraction communalities?
“small values indiciate variables that do not fit the factor solutions; higher values are desirable (the higher, the better)”
- however, if higher than .95, although accurate, it could also be an error or indication that redundancy is present (2 items are the same)
- better up having higher communalities than lower ones (don’t panic about the occasional 0.95, rather about below 0.3!)]
what are eigenvalues?
- “eigen” means own = own value (German)
- an eigenvalue for a given factor measures the variance in all of the items that is accounted for by that factor
- think of the eigenvalues as providing the “numeric key” that enables the presence of factors needing to be decoded
how many factors should be retained?
- eigenvalues generated by FA are technically equal to the no. of items (bigger than 1) however, we only use the largest as an indicator of no. of factors to retain
- you want factor solutions to be PARSIMONIOUS (simple & straightforward)
- factor solutions should make sense from a theoretical perspective
- they also should be useful! (e.g. if one factor is parsimonious but can’t break down to a more specific level, get rid of it!)
List the factor retention guides
(1) Kaiser’s criterion (1960)
- the number of eigenvalues is higher than 1, ut us reflected in the number of factors that should be retained
(2) Cattell’s scree plot (1978)
- when eigenvalues are graphed, the point where the slope of the “scree” levels off marks the number of factors that should be retained
- easier to work with exact numbers vs Cattell’s scree plot (as it is a visual way)
- never present this in a report for the reader!
FOR GREATER COMPREHENSIVENESS:
(3) Velicer’s minimum average partial (MAP) test (1976)
(4) Horn’s parallel analysis (1965)
- these won’t be covered in the course but they’re easy to self-teach yourself on SPSS
- remember that all of these statistics are only guides for interpretation (so there’s no concrete right/wrong answer) in interpreting statistics.
what is percentage of variance explained?
- the eigenvalues are an initial indication of this
- after EXTRACTION WINDOW, you base this on the extraction sums of squared loadings (SSL)
- after ROTATION WINDOW, you base it on the rotation SSL but only if you’re doing an orthogonal solution
- if in an oblique solution, you won’t be able to figure it out because they’ll be overlapping each other (variance)
- as explained previously, the higher, the better (Eigenvalues) –> affecting your proportion of variance
does percentage of variance matter?
- whilst variance explained is useful, the amount you want to see depends on what you’re looking for
[happy with a factor explaining approx. 10% of variance; given the higher error variance or statistical noise]
- the more variance explained, the better
- usually, the 1st factor eats up most variance, higher eigenvalues given first, lower eigenvalues given last
describe the steps in factor rotation
(1) EXTRACTION
- lets apply FA to the data set and extract factors
(2) ROTATION
- think of it as a statistical transformation; lets us look at the factors in a normal way, by shiftign the concepts in a conceptual space, seperating them as much as possible, enabling us to visualise them easily
Orthogonal vs oblique rotations - what are the differences them?
ORTHOGONAL ROTATIONS
- e.g. varimax, quartimax, equamax
- maximise difference between factors, hence facilitate independent, uncorrelated factors
OBLIQUE ROTATIONS
- e.g. direct oblimin, promax
- more complex rotations that facilitate correlated factors
Start with promax
See if varimax gives a neater solution (more parsimonious)
If both are neat, go with varimax, because of parsimonious math
What’s all this about conceptual space?
- think of it like a cartesian plane
- the rotation is a bit like a statistical transformation
- it rotates the axes to the point of best fit (what’s happening underneath the hood when SPSS is doing all the math)
what’s the disclaimer* on rotation?
- rotations can’t force correlation or lack of correlation where this isn’t present in the data
- so if you try an unsuitable rotation, you will get gibberish (messy solution)
what table shows the factor solution?
orthogonal rotations > “rotated factor matrix”
oblique rotations > “pattern matrix”
REMEMBER:
- we want factors loading hight on one factor, but not the other (so reasonably unique to one factor)
the higher the factor loadings, the better!
what is cross-loading?
cross-loading occurs when items load on multiple factors
- we want factor loadings to be as high as possible
- factor loading represent the proportion of variance in an item that is explained by the underlying factor
- values below .3 regarded as poor
what’s the term given to the “absence of cross-loading”?
Simple structure