Week 12 Flashcards
What is principal component analysis? What kind of data ?
dimension reduction – i.e., we transform the original X’s and work with M transformed variables, where M < P.
unsupervised learning method that is used to summarize a large set of correlated variables.
What are loadings?
linear weights
What can PCA be used for?
inputs for supervised learning methods or for data visualization.
extract variables corresponding to the directions along which the data vary the most, CHOOSE PHI JM
What is the primary purpose of PC
extract variables corresponding to the directions along which the data vary the most.
How many components can we decompose X into?
min(p,n-1) independent principal components
What are the principal components obtained?
linear combination of All x with linear weights called loadings
The _____________ and __________ are unique up to ______
principal component
loadings vector
sign
What are principal components ordered on ?
share of the total variance of X explained. This share equals each component’s variance divided by the sum of variances of all PC’s.
If you have fewer x than observatrions, …
will obtain all the p components
How to tell whether 2 pca are independent?
PERPENDICULAR
Are principal components scale variant? Why ?
Scaling up one variable by a constant would blow up its variance and change it’s loading
have to standardize data unless same units
When does pca work best? What kind of data must it be used on ?
highly correlated data (R VALUE ABOVE 0.5)
when continuous var ( cannot have categorical variable, or else need to use CATPCA)
What is component score?
value of z variable
What are the axes in pca biplot?
left and bottom: component score(zm)
top and right : loadings
What is scree plot? How to find number of componennts?
proportion of total variance of X explained by each subsequent component
look for an “elbow” in the plot, where contribution to variance drops sharply and flattens. keep retaining components until elbow appears in the plot