Supervised Learning Flashcards
What is supervised learning
Predicts target variables using labeled data (predictor variables)
What is unsupervised learning
Uncovers hidden patterns using unlabeled data
What is reinforcement learning
Software interacts with environment using system of rewards and punishments to optimize behavior
type()
Tells the type of data
Ex: (numpy.ndarray)
.shape
Tells the shape of the array or dataset
Ex: (150, 4)
.target_names
Shows the dependent variables of the array
Ex: array([‘setosa’, ‘versicor’, ‘virginica’], …)
What is k-Nearest Neighbors and how does it work
It predicts label of a data point by looking at ‘k’ closest labeled data points
It takes majority vote
.fit()
Uses training data to create a model called “fitting”
.predict()
Predicts the labels of new data based on what it learned from the .fit() method
What is train_test_split() and what is the function structure
Divides up data into train and test sets to create unbiased prediction models
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3 …)
How does larger ‘k’ effect model complexity and smoothness
Less complex model and smoother
How does smaller ‘k’ effect model complexity and smoothness
More complex model and less smooth; can lead to overfitting
How to read .csv files
pd.read_csv()
How to drop an entire column in a dataframe named df
df.drop(‘column_name’, axis=1)
How to get values of a column ‘col’ of a dataframe called df
df[‘col’].values
Imports
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve
from sklearn.model_selection import GridSearchCV
How to get accuracy of training model on test model
.score(X_test, y_test)
What is cross-validation and how to use it
Cross-validation divides up training data into multiple folds and test data into a single fold to get multiple accuracy scores at once
cross_val_score(regressor, X, y, cv=3)
Regressor can be any chosen such as LinearRegression and there are 3 folds
Time how long it takes for programs to run
% timeit before any function
What is Lasso regression and how to use it
Can be used to select important features of a dataset by minimizing unimportant variable coefficients to 0
lasso = Lasso(alpha=0.1, normalize=True)
lasso.fit(X_train, y_train)
lasso_pred = lasso.predict(X_test)
lasso.score(X_test, y_test)
What is Ridge regression and how to use it
It is a minimize loss function where alpha is the hyperparameter that needs to be tuned. Low alpha is less complex and can lead to overfitting while very high alpha can lead to underfitting
ridge = Ridge(alpha=0.1, normalize=True)
ridge.fit(X_train, y_train)
ridge_pred = ridge.predict(X_test)
ridge.score(X_test, y_test)
How to use Lasso for feature selection
lasso = Lasso(alpha=0.1)
lasso_coef = lasso.fit(X, y).coef_
print(lasso_coef)
How to get the coefficient attribute
.coef_
What does logistic regression do and how does it work
It outputs probabilities
If ‘p’ > 0.5, data labeled ‘1’
If ‘p’ < 0.5, data labeled ‘0’