Supervised Learning Flashcards by Aman Nurani

What is supervised learning

Predicts target variables using labeled data (predictor variables)

How well did you know this?

Not at all

Perfectly

What is unsupervised learning

Uncovers hidden patterns using unlabeled data

How well did you know this?

Not at all

Perfectly

What is reinforcement learning

Software interacts with environment using system of rewards and punishments to optimize behavior

How well did you know this?

Not at all

Perfectly

type()

Tells the type of data

Ex: (numpy.ndarray)

How well did you know this?

Not at all

Perfectly

.shape

Tells the shape of the array or dataset

Ex: (150, 4)

How well did you know this?

Not at all

Perfectly

.target_names

Shows the dependent variables of the array

Ex: array([‘setosa’, ‘versicor’, ‘virginica’], …)

How well did you know this?

Not at all

Perfectly

What is k-Nearest Neighbors and how does it work

It predicts label of a data point by looking at ‘k’ closest labeled data points

It takes majority vote

How well did you know this?

Not at all

Perfectly

.fit()

Uses training data to create a model called “fitting”

How well did you know this?

Not at all

Perfectly

.predict()

Predicts the labels of new data based on what it learned from the .fit() method

How well did you know this?

Not at all

Perfectly

What is train_test_split() and what is the function structure

Divides up data into train and test sets to create unbiased prediction models

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3 …)

How well did you know this?

Not at all

Perfectly

How does larger ‘k’ effect model complexity and smoothness

Less complex model and smoother

How well did you know this?

Not at all

Perfectly

How does smaller ‘k’ effect model complexity and smoothness

More complex model and less smooth; can lead to overfitting

How well did you know this?

Not at all

Perfectly

How to read .csv files

pd.read_csv()

How well did you know this?

Not at all

Perfectly

How to drop an entire column in a dataframe named df

df.drop(‘column_name’, axis=1)

How well did you know this?

Not at all

Perfectly

How to get values of a column ‘col’ of a dataframe called df

df[‘col’].values

How well did you know this?

Not at all

Perfectly

Imports

Study These Flashcards

import numpy as np

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import roc_curve

from sklearn.model_selection import GridSearchCV

How to get accuracy of training model on test model

Study These Flashcards

.score(X_test, y_test)

What is cross-validation and how to use it

Study These Flashcards

Cross-validation divides up training data into multiple folds and test data into a single fold to get multiple accuracy scores at once

cross_val_score(regressor, X, y, cv=3)

Regressor can be any chosen such as LinearRegression and there are 3 folds

Time how long it takes for programs to run

Study These Flashcards

% timeit before any function

What is Lasso regression and how to use it

Study These Flashcards

Can be used to select important features of a dataset by minimizing unimportant variable coefficients to 0

lasso = Lasso(alpha=0.1, normalize=True)
lasso.fit(X_train, y_train)
lasso_pred = lasso.predict(X_test)
lasso.score(X_test, y_test)

What is Ridge regression and how to use it

Study These Flashcards

It is a minimize loss function where alpha is the hyperparameter that needs to be tuned. Low alpha is less complex and can lead to overfitting while very high alpha can lead to underfitting

ridge = Ridge(alpha=0.1, normalize=True)
ridge.fit(X_train, y_train)
ridge_pred = ridge.predict(X_test)
ridge.score(X_test, y_test)

How to use Lasso for feature selection

Study These Flashcards

lasso = Lasso(alpha=0.1)
lasso_coef = lasso.fit(X, y).coef_
print(lasso_coef)

How to get the coefficient attribute

Study These Flashcards

.coef_

What does logistic regression do and how does it work

Study These Flashcards

It outputs probabilities

If ‘p’ > 0.5, data labeled ‘1’
If ‘p’ < 0.5, data labeled ‘0’

Accuracy formula

tp + tn / tp + tn + fp + fn

Precision formula

tp / tp + fp

Recall formula

tp / tp + fn

What is a confusion matrix and how to use it

Matrix structured in the following way: [[tp fn] [fp tn]] print(confusion_matrix(y_test, y_pred))

What is a classification report and how to use it

Shows precision, recall, f1 score and support for the prediction set compared to the text set print(classification_report(y_test, y_pred))

Plotting ROC Curves

Idk

What is GridSearchCV and how to use it and how to return results

Runs 2 parameters at a time across a grid to determine the best hyperparameters to use for the model param_grid = {‘n_neighbors’ : no.arrange(1, 50)} knn = KNeighborsClassifier() knn_cv = GridSearchCV(knn, param_grid, cv=5) knn_cv.fit(X, y) knn_cv.best_params_ knn_cv.best_score_

How to convert categorical features to numerical features for Scikit-learn and pandas

Use dummy variables for pandas: pd.get_dummies(df) Use one-hot-encoder for Scikit-learn: OneHotEncoder()

How to drop missing data Dataframe: df Columns to drop missing values in: insulin, triceps and bmi

df. insulin.replace(0, np.nan, inplace=True) df. triceps.replace(0, np.nan, inplace=True) df. bmi.replace(0, np.nan, inplace=True) Each had 768 entries but now 394, 541 and 757 bc the rest wer dropped

df.dropna()

Drops all rows containing 0 in the entire dataframe; too much data is lost

Supervised Learning Flashcards

(34 cards)