How to Select Features for Numerical Output Flashcards

Question 1

Q

WHAT IS THE SCIKIT-LEARN’S IMPLEMENTATION OF CORRELATION STATISTIC? P175

Answer

A

Linear correlation scores are typically a value between -1 and 1 with 0 representing no relationship. For feature selection, scores are made positive and we are often interested in a positive score with the larger the positive value, the larger the relationship, and, more likely, the feature should be selected for modeling. As such the linear correlation can be converted into a correlation statistic with only positive values
The f_regression () function

Question 2

Q

IS THE SCORE GIVEN BY SCIKIT-LEARN’S IMPLEMENTATION OF CORRELATION, IN THE SAME RANGE AS REGULAR CORRELATION? P175

Answer

A

No, it’s a positive number and the higher the better

Question 3

Q

HOW CAN WE USE MUTUAL INFORMATION FOR NUMERIC REGRESSION PROBLEMS IN SELECTKBEST? P178

Answer

A

mutual_info_regression

Question 4

Q

HOW CAN WE IMPLEMENT A PIPELINE AND HAVE ACCESS TO ITS OBJECTS AND THEN HAVE ACCESS TO THESE OBJECTS’ PARAMETERS? P 186 (WITH CODE)

Answer

A

We can access pipeline’s objects using the name we give them.
We can access pipeline’s objects’ parameters using dunder (__), exp: sel__k; for tuning the k parameter of a selectkbest function

Question 5

Q

WRITE IN CODE, HOW WE CAN CREATE A PIPELINE WITH KBEST AND MODEL AND THEN CREATE A GRID SEARCH FOR PARAMETER K OF THE KBEST. P186

Answer

A

model = LinearRegression()
fs = SelectKBest(score_func=mutual_info_regression)
pipeline = Pipeline(steps=[(‘sel’,fs), (‘lr’, model)])
grid={‘sel__k’:[range( dataset.shape[1]-20,data.shape[1]+1)]}
search=GridSearchCV( pipeline, grid, X, y, cv=5, scoring=”neg_mean_absolut_value”,n_jobs=-1)

Question 6

Q

HOW CAN WE MAKE A REGRESSION DATASET IN PYTHON? P186

Answer

A

X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)

Question 7

Q

WHICH ATTRIBUTES DO WE USE TO GET THE BEST SCORE AND THE BEST PARAMETERS FROM A GRIDSEARCHCV CLASS? P187

Answer

A

After fitting: results = search.fit(X, y)
Best score: results.best_score_
Best parameters: results.best_params_