Supervised Learning Flashcards

1
Q

What is supervised learning

A

Predicts target variables using labeled data (predictor variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is unsupervised learning

A

Uncovers hidden patterns using unlabeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is reinforcement learning

A

Software interacts with environment using system of rewards and punishments to optimize behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

type()

A

Tells the type of data

Ex: (numpy.ndarray)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

.shape

A

Tells the shape of the array or dataset

Ex: (150, 4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

.target_names

A

Shows the dependent variables of the array

Ex: array([‘setosa’, ‘versicor’, ‘virginica’], …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is k-Nearest Neighbors and how does it work

A

It predicts label of a data point by looking at ‘k’ closest labeled data points

It takes majority vote

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

.fit()

A

Uses training data to create a model called “fitting”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

.predict()

A

Predicts the labels of new data based on what it learned from the .fit() method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is train_test_split() and what is the function structure

A

Divides up data into train and test sets to create unbiased prediction models

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3 …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does larger ‘k’ effect model complexity and smoothness

A

Less complex model and smoother

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does smaller ‘k’ effect model complexity and smoothness

A

More complex model and less smooth; can lead to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to read .csv files

A

pd.read_csv()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to drop an entire column in a dataframe named df

A

df.drop(‘column_name’, axis=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to get values of a column ‘col’ of a dataframe called df

A

df[‘col’].values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Imports

A

import numpy as np

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import roc_curve

from sklearn.model_selection import GridSearchCV

17
Q

How to get accuracy of training model on test model

A

.score(X_test, y_test)

18
Q

What is cross-validation and how to use it

A

Cross-validation divides up training data into multiple folds and test data into a single fold to get multiple accuracy scores at once

cross_val_score(regressor, X, y, cv=3)

Regressor can be any chosen such as LinearRegression and there are 3 folds

19
Q

Time how long it takes for programs to run

A

% timeit before any function

20
Q

What is Lasso regression and how to use it

A

Can be used to select important features of a dataset by minimizing unimportant variable coefficients to 0

lasso = Lasso(alpha=0.1, normalize=True)
lasso.fit(X_train, y_train)
lasso_pred = lasso.predict(X_test)
lasso.score(X_test, y_test)

21
Q

What is Ridge regression and how to use it

A

It is a minimize loss function where alpha is the hyperparameter that needs to be tuned. Low alpha is less complex and can lead to overfitting while very high alpha can lead to underfitting

ridge = Ridge(alpha=0.1, normalize=True)
ridge.fit(X_train, y_train)
ridge_pred = ridge.predict(X_test)
ridge.score(X_test, y_test)

22
Q

How to use Lasso for feature selection

A

lasso = Lasso(alpha=0.1)
lasso_coef = lasso.fit(X, y).coef_
print(lasso_coef)

23
Q

How to get the coefficient attribute

A

.coef_

24
Q

What does logistic regression do and how does it work

A

It outputs probabilities

If ‘p’ > 0.5, data labeled ‘1’
If ‘p’ < 0.5, data labeled ‘0’

25
Q

Accuracy formula

A

tp + tn / tp + tn + fp + fn

26
Q

Precision formula

A

tp / tp + fp

27
Q

Recall formula

A

tp / tp + fn

28
Q

What is a confusion matrix and how to use it

A

Matrix structured in the following way:

[[tp fn]
[fp tn]]

print(confusion_matrix(y_test, y_pred))

29
Q

What is a classification report and how to use it

A

Shows precision, recall, f1 score and support for the prediction set compared to the text set

print(classification_report(y_test, y_pred))

30
Q

Plotting ROC Curves

A

Idk

31
Q

What is GridSearchCV and how to use it and how to return results

A

Runs 2 parameters at a time across a grid to determine the best hyperparameters to use for the model

param_grid = {‘n_neighbors’ : no.arrange(1, 50)}
knn = KNeighborsClassifier()
knn_cv = GridSearchCV(knn, param_grid, cv=5)
knn_cv.fit(X, y)

knn_cv.best_params_
knn_cv.best_score_

32
Q

How to convert categorical features to numerical features for Scikit-learn and pandas

A

Use dummy variables for pandas: pd.get_dummies(df)

Use one-hot-encoder for Scikit-learn: OneHotEncoder()

33
Q

How to drop missing data

Dataframe: df
Columns to drop missing values in: insulin, triceps and bmi

A

df. insulin.replace(0, np.nan, inplace=True)
df. triceps.replace(0, np.nan, inplace=True)
df. bmi.replace(0, np.nan, inplace=True)

Each had 768 entries but now 394, 541 and 757 bc the rest wer dropped

34
Q

df.dropna()

A

Drops all rows containing 0 in the entire dataframe; too much data is lost