Lecture 4 Flashcards
What is the key difference between principal components and factor analysis?
- PCA: finds optimal linear transformations
- FA: assumes latent factors that are not directly oberved
- there is no model in PCA, but there is a model (can test fit) in EFA
- PCA is simply a weighted sum of variables
How does PCA work?
- graphically, finds new axes for your data
- new components are chosen one by one, to maximise variance not yet accounted for
How many components can you make with N variables?
N components.
BUT if you use less than N, then there are a smaller no. of components, then there is freedom in the final solution
- also if you use less than N, you can rotate to get a simplet solution
Why is PCA simple?
- they are not correlated, even if the original variables are
- first component explains the most variance > thus you know which components are the most important
How do you determine how many components/factors to extract?
- SPSS default is no. of eigenvalues > 1 (DO NOT USE), called Kaiser-Guttman
- use Screen plot (where it turns)
- use parallel
- use MAP
Explain the parallel test
- uses random data (with same dimensions as your dataset) as a baseline
- if eigenvalue is higher than random (noise) data, then it must be signal
- where “raw data”
- the “pcntile” is the 95th percentile
Explain the MAP test
- plots squared partial correlations and gets MINIMUM
- as more components are extacted, more are partialled out of correlation matrix, SPCs approach 0
- but then at some point ‘noise’ components get partialled out, and the SPCs increase again
- therefore, want the minimum
What does a -ve or high component/factor loading mean?
- negative: you get a high score on that item, you get a low score on the component/factor
- negative loading similar to reverse scoring
- high: higher score on that item, higher score on factor/component
Why rotate components?
- simpler structure
- easier to interpret
What are the 2 types of rotation?
- orthogonal: remain uncorrelated
- oblique: correlated
What are the specific SPSS rotations?
- orthogonal: varimax, equamax, quartimax
- oblique: direct oblimin, promax
What do you interpret after rotation?
- oblique: pattern matrix
- factor correlations
(structure matrix = product of pattern and factor correlation matrix) - orthogonal: rotated
What does EFA assume?
that there are some underlying latent factor that cannot be directly observed > searches for these
What is ui? What is k?
- u: the specific factor (noise/error)
- k: the common factor
What are the assumptions of EFA?
- common factors standardised (variance = 1)
- common factors uncorrelated
- specific factors uncorrelated
- common factors uncorrelated with specific factors
- multivariate normality
What is the underlying rationale of EFA?
- partial correlations
- correlation b/w item 1 and item 2, WHEN HOLDING CONSTANT a latent variable is…
- if PC is 0, then correlation b/w the items is fully explained by the factor > want it as close to 0 as possible
- aim to find a latent variable that accounts for observed correlation (i.e. make it as close to 0 as possible)
- if we can find these correlations/mimic the covariance matrix, then we have found the latent factors
What is the communality?
- the variance due to the common factors
- want HIGH communalities
What are the rules/guidelines about sample size for EFA? What is the problem will small sample size?
- 150+
- absolute sample size + communalities are more important
- ratio > variables:sample size NOT important
- if loadings are high, then you can have a lower sample size
- less generalisable if too small
What are the 3 things you want for EFA?
- high communalities (>.8 ideal, but reality is .4-.7) > can drop things if they have low communality (but be careful)
- few cross-loadings (>.032)
- more than 3 strongly loaded items per factor
^^^ need a larger sample if these are not met
What is the issue with high communalities? How do you fix this?
- you only know them after you find the factor loadings
- so… use prior diagnostics!!