How to Use Feature Importance Flashcards
WHAT ARE THE 3 MAIN TYPES OF MORE ADVANCED FEATURE IMPORTANCE? P209
1-Feature importance from model coefficients (a crude method)
2-Feature importance from decision trees
3-Feature importance from permutation testing
IN WHICH MODELS CAN WE USE COEFFICIENTS AS FEATURE IMPORTANCE? P210
Linear machine learning algorithms fit a model where the prediction is the weighted sum of the input values. Examples include linear regression, logistic regression, and extensions that add regularization, such as ridge regression, LASSO, and the elastic net.
WHAT ATTRIBUTE DO WE USE TO GET COEFFICIENT FROM MODELS THAT HAVE IT? (REGRESSION AND CLASSIFICATION) WHAT ASSUMPTION DOES IT MAKE? P210
coef_ (for regression) coef_[0] (for classification)
It assumes that input variables are on the same scale.
WHAT DO POSITIVE/NEGATIVE COEFFICIENT SCORES INDICATE IN A CLASSIFICATION PROBLEM? P212
Negative: indicate a feature that predicts class 0/ Positive: indicate a feature that predicts class 1
HOW DOES DECISION TREE IMPORTANCE SCORES THE FEATURES? P213
Based on the reduction in the criterion used to select split points, like Gini or entropy.
WHICH MODELS CAN USE DECISION TREE FEATURE IMPORTANCE? P213
Classification and Regression trees (CART), Ensemble Decision Trees (Random Forest, Stochastic Gradient Boost, Extra Trees)
WHICH ATTRIBUTE OF CART ALGORITHMS IS USED FOR FEATURE IMPORTANCE SCORING? P213
feature_importances_
WHAT IS PERMUTATION FEATURE IMPORTANCE? P220
Permutation feature importance is a technique for calculating relative importance scores that is independent of the model used.
HOW DO WE IMPORT PERMUTATION FEATURE IMPORTANCE? P221
from sklearn.inspection import permutation_importance
WHAT DO WE HAVE TO DO WITH THE MODEL BEFORE USING PERMUTATION FEATURE IMPORTANCE? P221
Fit it to the dataset
WHAT ARE THE PARAMETERS OF PERMUTATION FEATURE IMPORTANCE? P221
permutation_importance (model, X, y, scoring=’neg_mean_squared_error’)
WHICH CLASS DO WE USE FOR A MODEL BASED FEATURE SELECTION? CODE P225
Fs= SelectFromModel (RandomForestClassifier (n_estimators=200), max_features=5)
WHAT DO WE NEED TO DO BEFORE CALCULATING IMPORTANCE SCORES USING A MODEL BASED FEATURE SELECTION (SelectFromModel class)? P225
By fit_transform the feature selection method on the training dataset