12 - Dimension Reduction Flashcards

Question

What does the eigenvalue criterion suggest for extracting components?

Answer 1

Only components with eigenvalues greater than one should be retained.

Answer 2

An eigenvalue of 1.0 indicates that the component explains about 'one predictor's worth' of variability.

Answer 3

The criterion specifies the proportion of total variability that the principal components should account for.

Answer 4

k = 4 components should be extracted.

Answer 5

The rotated component matrix provides clearer interpretations of the principal components.

Answer 6

The PCA results should be validated using the test data set.

Answer 7

All VIFs equal 1 indicate the elimination of multicollinearity.

Answer 8

Standardizing ensures that all predictors contribute equally to the analysis.

Answer 9

The PCA() command from the sklearn.decomposition module is used.

Answer 10

The cutoff suppresses small PCA weights to enhance interpretability.

Answer 11

Import the required data sets and separate the predictor variables.

Answer 12

It standardizes the predictor variables.

Answer 13

Different Items Purchased and Purchase Visits

Answer 14

At least 20 predictors.

Answer 15

To simplify the interpretation of the components by maximizing variance among them. ## Footnote Varimax rotation is a method used in factor analysis to achieve a simpler and more interpretable factor structure.

Answer 16

round(cor()) ## Footnote This command calculates the correlation matrix for the scores obtained from PCA.

Answer 17

Variance Inflation Factor ## Footnote VIF measures how much the variance of an estimated regression coefficient increases when your predictors are correlated.

Answer 18

False ## Footnote Multicollinearity does not significantly impact the predictive ability of the regression model.

Answer 19

Estimation and prediction of the target variable ## Footnote Interpretation of the model is inappropriate due to nonsensical individual coefficients.

Answer 20

Save each component as its own variable ## Footnote This allows for the use of PCA components as predictors in the regression model.

Answer 21

principal components ## Footnote Principal components are linear combinations of the original variables that capture the maximum variance.

Answer 22

The number of components to retain based on eigenvalues greater than 1 ## Footnote Components with eigenvalues greater than 1 explain more variance than a single original variable.

Answer 23

It helps determine the number of components to retain by explaining a specified percentage of total variance. ## Footnote Common thresholds are 70%, 80%, or 90% of total variance.

Answer 24

That multicollinearity is a problem ## Footnote A VIF value above 5 or 10 is often considered indicative of multicollinearity.

Answer 25

Some principal components may be correlated with the first principal component. ## Footnote Correlation among components can indicate shared variance or underlying relationships.

Answer 26

The presence of high correlations among predictor variables ## Footnote Multicollinearity can inflate the variance of coefficient estimates, making them unstable.

Answer 27

Principal Component Analysis ## Footnote PCA is a technique used for dimensionality reduction while preserving as much variance as possible.

Answer 28

lm() ## Footnote The lm() function is used to fit linear models in R.

Answer 29

It shows the variance inflation factors for the regression model. ## Footnote VIF output helps identify potential multicollinearity issues among predictors.

Answer 30

Standardize or normalize them ## Footnote Standardization ensures that each predictor contributes equally to the analysis.

Answer 31

The relationships between predictor variables ## Footnote High correlations among variables indicate redundancy and potential multicollinearity.

Answer 32

Wine quality ## Footnote This dataset typically includes various chemical properties of red wine as predictors.

Answer 33

By combining recommendations from the eigenvalue criterion and the proportion of variance explained criterion. ## Footnote This approach ensures a comprehensive evaluation of component significance.

Answer 34

Having a large number of features or predictors in a dataset ## Footnote High dimensionality can lead to overfitting and increased computational cost.

Answer 35

To reduce the number of predictors while retaining essential information ## Footnote This helps improve model performance and interpretability.

Answer 36

How many components to retain based on their explained variance ## Footnote The eigenvalue plot visually represents the variance accounted for by each component.

Answer 37

To identify highly correlated variables ## Footnote Understanding correlations helps in diagnosing multicollinearity issues.

12 - Dimension Reduction Flashcards

(62 cards)