How to Select Features for Numerical Output Flashcards
WHAT IS THE SCIKIT-LEARN’S IMPLEMENTATION OF CORRELATION STATISTIC? P175
Linear correlation scores are typically a value between -1 and 1 with 0 representing no relationship. For feature selection, scores are made positive and we are often interested in a positive score with the larger the positive value, the larger the relationship, and, more likely, the feature should be selected for modeling. As such the linear correlation can be converted into a correlation statistic with only positive values The f_regression () function
IS THE SCORE GIVEN BY SCIKIT-LEARN’S IMPLEMENTATION OF CORRELATION, IN THE SAME RANGE AS REGULAR CORRELATION? P175
No, it’s a positive number and the higher the better
HOW CAN WE USE MUTUAL INFORMATION FOR NUMERIC REGRESSION PROBLEMS IN SELECTKBEST? P178
mutual_info_regression
HOW CAN WE IMPLEMENT A PIPELINE AND HAVE ACCESS TO ITS OBJECTS AND THEN HAVE ACCESS TO THESE OBJECTS’ PARAMETERS? P 186 (WITH CODE)
We can access pipeline’s objects using the name we give them.
We can access pipeline’s objects’ parameters using dunder (__), exp: sel__k; for tuning the k parameter of a selectkbest function
WRITE IN CODE, HOW WE CAN CREATE A PIPELINE WITH KBEST AND MODEL AND THEN CREATE A GRID SEARCH FOR PARAMETER K OF THE KBEST. P186
model = LinearRegression()
fs = SelectKBest(score_func=mutual_info_regression)
pipeline = Pipeline(steps=[(‘sel’,fs), (‘lr’, model)])
grid={‘sel__k’:[range( dataset.shape[1]-20,data.shape[1]+1)]}
search=GridSearchCV( pipeline, grid, X, y, cv=5, scoring=”neg_mean_absolut_value”,n_jobs=-1)
HOW CAN WE MAKE A REGRESSION DATASET IN PYTHON? P186
X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)
WHICH ATTRIBUTES DO WE USE TO GET THE BEST SCORE AND THE BEST PARAMETERS FROM A GRIDSEARCHCV CLASS? P187
After fitting: results = search.fit(X, y)
Best score: results.best_score_
Best parameters: results.best_params_