Factor Analysis Part 1 (wk 1) Flashcards
what is factor analysis?
- a statistical technique for identifying latent constructs underlying a group of variables
- used to categorise variables into groups based on common variance
- based on the correlational overlap between variables
when is factor analysis used?
- FA can be applied outside psychology but main use is for scale development and psychometric measure validation & design
- when we talk about variables –> ITEMS; when we talk about constructs –> TRAITS that we’re trying to assess
how do you which items assess which subscales?
through factor analysis!
AIM of FA: “to identify 1 or more underlying factors that are not directly observable but having casusative impact on item responses”
what different FA extraction techniques are there?
1) Maximum likelihood
2) Principal axis factoring (or principal factor analysis )
what does SPSS stand for?
Statistical Package for the Social Sciences
What is “r”?
- a package that statisticians use (Free) - that essentially codes all your statistics for your
- In statistics, the correlation coefficient r measures the strength and direction of a linear relationship between two variables on a scatterplot. The value of r is always between +1 and –1.
how do you run a factor analysis on SPSS?
1) Analyze > Dimension [Reduction > Factor
Shift variables (items you want to include in FA into variables box & ignore the selection variable box]
2) DESCRIPTIVES WINDOW
[tick determinants, KMO & Bartlett’s test of sphericity, and reproduced]
3) EXTRACTION WINDOW
[choose your extraction method - (a) maximum likelihood is most common EFA technique (b) principal axis factoring was 1st form of FA developed by Ketal - perfectly fine too!
- Maximum Likelihood is best as a starting point but see if principal axis factoring gives you a neater solution
- also “Maximum iterations for convergence” = means the number of shots SPSS will give before giving up on your data set (“nope, we can’t get any factors out of this data, try another”)]
4) ROTATION WINDOW
[choose your rotation technique
- varimax (orthogonal) most commonly used
- promax (oblique) most useful
- direct oblimin (older version of promax
- you want to try either varimax or promax as your starting point!
- set the convergence limit too (number of attempts before giving up)]
5) SCORES WINDOW
[though this sounds useful, nothing needs to be ticked within this box!]
6) OPTIONS WINDOW
[you can decide how to treat missing values
- select “exclude cases listwise (if someone is missing one of the variables = they’re out)
- select “suppress small coefficients (absolute value below .30)” = useful for visualising simple structure but turn off for your final table
7) HIT OK & GO THROUGH WITH FA OUTPUT
how many communalities are there?
two
1 - initial comunalities
2 - extraction communalities
define each of the communalities.
1) INITIAL COMMUNALITIES
[= are the proportion of variance in each item accounted for by the rest of the items
- represents the overlap between an amount & another item (as a collective)]
2) EXTRACTION COMMUNALITIES
[= are the proportion of variance in each item accounted for by the retained factors generated by the factor solution]
what’s the Good rule to go by for extraction communalities?
“small values indiciate variables that do not fit the factor solutions; higher values are desirable (the higher, the better)”
- however, if higher than .95, although accurate, it could also be an error or indication that redundancy is present (2 items are the same)
- better up having higher communalities than lower ones (don’t panic about the occasional 0.95, rather about below 0.3!)]
what are eigenvalues?
- “eigen” means own = own value (German)
- an eigenvalue for a given factor measures the variance in all of the items that is accounted for by that factor
- think of the eigenvalues as providing the “numeric key” that enables the presence of factors needing to be decoded
how many factors should be retained?
- eigenvalues generated by FA are technically equal to the no. of items (bigger than 1) however, we only use the largest as an indicator of no. of factors to retain
- you want factor solutions to be PARSIMONIOUS (simple & straightforward)
- factor solutions should make sense from a theoretical perspective
- they also should be useful! (e.g. if one factor is parsimonious but can’t break down to a more specific level, get rid of it!)
List the factor retention guides
(1) Kaiser’s criterion (1960)
- the number of eigenvalues is higher than 1, ut us reflected in the number of factors that should be retained
(2) Cattell’s scree plot (1978)
- when eigenvalues are graphed, the point where the slope of the “scree” levels off marks the number of factors that should be retained
- easier to work with exact numbers vs Cattell’s scree plot (as it is a visual way)
- never present this in a report for the reader!
FOR GREATER COMPREHENSIVENESS:
(3) Velicer’s minimum average partial (MAP) test (1976)
(4) Horn’s parallel analysis (1965)
- these won’t be covered in the course but they’re easy to self-teach yourself on SPSS
- remember that all of these statistics are only guides for interpretation (so there’s no concrete right/wrong answer) in interpreting statistics.
what is percentage of variance explained?
- the eigenvalues are an initial indication of this
- after EXTRACTION WINDOW, you base this on the extraction sums of squared loadings (SSL)
- after ROTATION WINDOW, you base it on the rotation SSL but only if you’re doing an orthogonal solution
- if in an oblique solution, you won’t be able to figure it out because they’ll be overlapping each other (variance)
- as explained previously, the higher, the better (Eigenvalues) –> affecting your proportion of variance
does percentage of variance matter?
- whilst variance explained is useful, the amount you want to see depends on what you’re looking for
[happy with a factor explaining approx. 10% of variance; given the higher error variance or statistical noise]
- the more variance explained, the better
- usually, the 1st factor eats up most variance, higher eigenvalues given first, lower eigenvalues given last