PCA MLM Flashcards
Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a technique used for dimensionality reduction, which is particularly useful in high-dimensional data analysis, visualization, noise filtering, and more.
- Introduction
PCA is a statistical procedure that uses orthogonal transformations to convert a set of possibly correlated variables into a set of linearly uncorrelated variables called principal components.
- Dimensionality Reduction
PCA is a popular method for reducing dimensionality in data while maintaining as much information as possible. It accomplishes this by transforming the original variables to a new set of variables, the principal components, which are orthogonal (uncorrelated), and which capture the variation in the data in descending order.
- Eigenvalues and Eigenvectors
The principal components are the eigenvectors of the data’s covariance matrix, and the variance explained by each principal component corresponds to the magnitude of the associated eigenvalue.
- Data Preprocessing
PCA requires that data be standardized (mean of 0 and standard deviation of 1) prior to applying the procedure because it is sensitive to the scales of the original variables.
- Applications
PCA is commonly used in exploratory data analysis to visualize high-dimensional data, in machine learning to reduce the dimensionality of the feature space, in noise reduction, and in many other areas of data science and signal processing.
- Strengths
PCA is a simple, non-parametric method for extracting relevant information from confusing data sets. It can reduce a complex data set to a lower dimension to reveal the sometimes hidden, simplified structure that often underlie it.
- Limitations
PCA makes the assumption that the principal components are orthogonal, which might not always be the case. It’s also known that PCA can be affected heavily by outliers in the data. Furthermore, PCA only considers the variance of your data. If your data doesn’t have any variance (i.e., all your points are identical), PCA won’t give you a useful result.
- Interpretability
One drawback of PCA is that the resulting principal components are less interpretable than the original data. They do not correspond to any individual variable in your original data set, but instead a combination of variables.