5.1 Principal Components Analysis Flashcards
How does principal components analysis (PCA) transform the original variables in a dataset?
PCA transforms original variables into new uncorrelated variables called principal components that are linear combinations of the original variables.
Explain the mathematical constraint on the loadings for each principal component (PC). Why is this constraint important?
The loadings for each PC are constrained to sum 1. This constraint is important because it ensures that each principal component represents a normalized direction in the feature space, avoiding arbitrary scaling of the loadings.
Describe the process of constructing principal components. How does the first PC differ from subsequent PC’s?
Principal components are constructed sequentially. The first PC is designed to capture the maximum explained variance in the data. Each subsequent PC maximizes the remaining variance while being constrained to be orthogonal to all previously computed PCs.
How does eigen decomposition of the covariance matrix produce the principal components?
Eigen decomposition of the covariance matrix produces eigenvectors, which contain the principal component loadings, and eigenvalues, which represent the variance explained by each PC.
Describe the process of calculating principal component scores. What do these scores represent geometrically?
PC scores are calculated by multiplying the centered data matrix by the loadings matrix. Geometrically, the scores represent the projections of observations onto the principal component axes.
When implementing PCA in R, what does the function prcomp() output? What preprocessing options does it offer?
The prcomp() function in R outputs the standard deviations of the principal components, the rotation or loadings matrix, and the scores. It offers options for centering and scaling the data before analysis.
What information does a biplot provide in PCA? How does it help visualize both observations and variables in the biplot?
A biplot in PCA provides a visualization of both observations and variables in the same plot, showing the scores and loadings for two principal components. Typically, we are most interested in the first two principal components, as they explain the most variance.
How does PCA preserve the total variance of a dataset? Explain the mathematical relationship involved.
The sum of the variances of all PCs equals the sum of the variances of the original variables.
How can you determine the number of principal components to retain in your analysis?
Methods for determining the number of principal components to retain include examining the scree plot, considering the proportion of variance explained, and applying the Kaiser rule (retaining PCs with eigenvalues greater than 1).
What is the difference between feature transformation and feature extraction in the context of PCA?
In PCA, feature transformation refers to creating new features as linear combinations of the original variables, while feature extraction involves selecting a subset of these new features for further analysis.
The principal components extracted from the data capture the most significant aspects of the original variables. This extraction process allows us to focus on the components that account for the largest amount of variance in the data, simplifying the dataset while preserving its essential features.
It is important to note that PCA does not perform variable selection. All original variables contribute to the construction of each principal component, although their contributions may vary in magnitude.
How does PCA address the issue of collinearity in a dataset?
PCA addresses collinearity in a dataset by creating new uncorrelated variables from the original correlated variables.
Explain the concept of non-distinct principal components. What is the most number of distinct principal components that can be produced from a dataset?
Non-distinct principal components are components that have a variance of zero. For a dataset with p variables and n observations, there can be a maximum of min(n-1,p) distinct principal components.
Why is it generally recommended to scale variables before performing PCA? What alternative approach can be used?
Scaling is recommended in PCA to prevent variables with larger variances from dominating the analysis. An alternative approach to scaling is to use a common unit of value across variables.
How can PCA be used as a tool for identifying latent variables in a dataset?
PCA can help identify latent variables by revealing underlying patterns or structures in the data that may correspond to unobserved factors influencing the observed variables.