Week 4 Flashcards
What does PCA stand for?
Principal Component Analysis (PCA)
What essentially is Principal Component Analysis (PCA)?
Combining variables into weighted sums
What essentially is Factor Analysis
A technique for showing relationships among variables by relationships to hypothetical underlying factors aka LATENT VARIABLES
Principal Component Analysis (CFA) uses correlation as the underlying association among variables, but Factor Analysis does not, True/False?
FALSE
They both use correlations as the underlying thing
How would you characterise the difference between Principal Component Analysis (CFA) and Factor Analysis?
PCA looks for ‘optimal linear transformations’ - I don’t know what this means but I’m imagining smushing variables together.
Factor Analysis makes a theoretical assumption that there is an underlying latent variable that explains the observed variables
What determines where you draw the line for the FIRST PRINCIPAL COMPONENT?
The line that minimises the diagonal lines to each of the data points (there’s probably a more technical way to say that) - and I think that line is based on something called the EIGEN VECTOR
What even is the SECOND PRINCIPAL COMPONENT?
I think it might actually be the original data, but I’m not sure
What is the name given to the VARIANCE of the FIRST PRINCIPAL COMPONENT?
The first EIGENVALUE
What do the weights of the first principle component always add up to?
1
What is one of the rules for determining the SECOND PRINCIPAL COMPONENT?
It must have zero correlation with the FIRST principle component
What’s the limit to number of principal components you can have?
It is based on the number of variables you have.
Number of variables = max principal components
Is principle component analysis a MATHEMATICAL or a STATISTICAL technique? And why
It’s mathematical
It’s based on matrix algebra
It doesn’t contemplate error values
According to Kaiser’s rule, when should you cut off the number of factors/components?
When the Eigenvalue drops below 1
Aka Kaiser-Guttman
Do we like the Kaiser-Guttman rule?
NO!
Always tends to chose a third of the variables.
Schmitt says it is “the most inaccurate” of all methods
What are the four methods for determining how many components/dfactors to use
- Kaiser-Guttman (don’t use)
- Scree plot
- Parallel Test (the one that uses random data)
- MAP test (the Minimum Average Partial Correlation test) Velicer(1976)
What do you use the Parallel Test for?
Deciding how many factors/components to use.
When doing the Parallel Test, do you use the mean of the random data or the 95th percentile?
I think you use the 95th percentile.
When examining your component loadings - I think it’s in a FACTOR LOADING MATRIX - it is common practice to remove any loadings that are less than 0.5, True/False?
FALSE
But close. You can remove any with a loading of less than 0.3.
This is called SUPPRESSING THE LOADINGS
In factor analysis, after extracting your factors/components, you are going to want to do some ROTATION. What are the two types of rotation you can do
Orthogonal and Oblique
What assumption underpins ORTHOGONAL rotation?
The factors and UNcorrelated
What assumption underpins OBLIQUE rotation?
The factors ARE correlated
In psychology, are you more likely to do ORTHOGONAL or OBLIQUE rotation?
OBLIQUE
In other words, the one that allows the factors to correlate
When you do oblique rotation, you will get two matrices of loadings. What are they called?
The factor PATTERN matrix, and
the factor STRUCTURE matrix
What does the factor PATTERN matrix contain?
The regression coefficients… of something
What does the factor STRUCTURE matrix contain?
The correlations… of something