How to Select Features for Numerical Output Flashcards

1
Q

WHAT IS THE SCIKIT-LEARN’S IMPLEMENTATION OF CORRELATION STATISTIC? P175

A
Linear correlation scores are typically a value between -1 and 1 with 0 representing no relationship. For feature selection, scores are made positive and we are often interested in a positive score with the larger the positive value, the larger the relationship, and, more likely, the feature should be selected for modeling. As such the linear correlation can be converted into a correlation statistic with only positive values
The f_regression () function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

IS THE SCORE GIVEN BY SCIKIT-LEARN’S IMPLEMENTATION OF CORRELATION, IN THE SAME RANGE AS REGULAR CORRELATION? P175

A

No, it’s a positive number and the higher the better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

HOW CAN WE USE MUTUAL INFORMATION FOR NUMERIC REGRESSION PROBLEMS IN SELECTKBEST? P178

A

mutual_info_regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

HOW CAN WE IMPLEMENT A PIPELINE AND HAVE ACCESS TO ITS OBJECTS AND THEN HAVE ACCESS TO THESE OBJECTS’ PARAMETERS? P 186 (WITH CODE)

A

We can access pipeline’s objects using the name we give them.
We can access pipeline’s objects’ parameters using dunder (__), exp: sel__k; for tuning the k parameter of a selectkbest function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WRITE IN CODE, HOW WE CAN CREATE A PIPELINE WITH KBEST AND MODEL AND THEN CREATE A GRID SEARCH FOR PARAMETER K OF THE KBEST. P186

A

model = LinearRegression()
fs = SelectKBest(score_func=mutual_info_regression)
pipeline = Pipeline(steps=[(‘sel’,fs), (‘lr’, model)])
grid={‘sel__k’:[range( dataset.shape[1]-20,data.shape[1]+1)]}
search=GridSearchCV( pipeline, grid, X, y, cv=5, scoring=”neg_mean_absolut_value”,n_jobs=-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

HOW CAN WE MAKE A REGRESSION DATASET IN PYTHON? P186

A

X, y = make_regression(n_samples=1000, n_features=100, n_informative=10, noise=0.1, random_state=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WHICH ATTRIBUTES DO WE USE TO GET THE BEST SCORE AND THE BEST PARAMETERS FROM A GRIDSEARCHCV CLASS? P187

A

After fitting: results = search.fit(X, y)
Best score: results.best_score_
Best parameters: results.best_params_

How well did you know this?
1
Not at all
2
3
4
5
Perfectly