ML Algorithm Implementations/Data manipulation Flashcards

Understand and internalize major ML algorithms implementations in scikit-learn. With data manipulation for data preprocessing.

1
Q

What is the process of implementing Linear Regression with sklearn?

A
  1. Given the dataset, Identify independant and dependant variables
  2. Do EDA and Data preprocessing
  3. Reshape x and y to be in right format:
    a.. x = x.reshape(-1,1)
    y = y.reshape(-1,1)
  4. split your train, and test set:

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size)

  1. instantiate and train model

model = LinearRegression()

model.fit(x_train,y_train)

  1. Get your predictions and loss values to evaluate your model

y_pred = model.predict(x_test)

mae = mean_absolute_error(y_true = y_test,y_pred = y_pred)
mse = mean_squared_error(y_true = y_test,y_pred = y_pred)
rmse = mean_squared_error(y_true = y_test,y_pred = y_pred, squared=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What libraries are needed to be imported for most ML algorithms

A

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.preprocessing StandardScaler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the regression algorithm import lines

A

from sklearn.linear_model import LinearRegression, LogisticRegression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where do you import PolynomialFeatures?

A

from sklearn.preprocessing import PolynomialFeatures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the import lines for KMeans?

A

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import adjusted_rand_score
import numpy as np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the process of implementing Logistic Regression with sklearn

A

Identify Target and independant variables: (x,y = data.data, data.target)
Split your train and test data: (train_test_split(x,y,test_size))
scale your independant variables in both train and test data :
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
instantiate and train your model:
model = LogisticRegression()
model.fit(x_train,y_train))
Evaluate model: model.score(x_test,y_test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the process of implementing Polynomial Regression

A

Essentially the same process as linear regression, Identify what your independant and dependant variables are, split them and reshape them.
This time, instantiate a poly_features = PolynomialFeatures(degree,include_bias)
then pass through your x through the poly_features, by doing poly_features.fit_transform(x)
now in the rest of your training and calculations, use the polynomial transformed x
y stays the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the process of implementing Kmeans

A

First get your dataset and ensure it is a classification problem
Second, Identify your dependant and independant variables and save them as such

iris = load_iris(as_frame=True)
x = iris.data
y = iris.target

third, scale your independant variables

scaler = StandardScaler()
x_scaled = scaler.fit_transform(x)

fourth, instantiate your model, then fit your model

kmeans = KMeans(n_clusters=3)
kmeans.fit(x_scaled)
labels = kmeans.labels_

fifth, visualize your data if need be, and evaluate your model using
a confusion matrix of a Rand Index

Compute the confusion matrix
conf_matrix = confusion_matrix(true_labels, labels)

Calculate purity
purity = np.sum(np.amax(conf_matrix, axis=0)) / np.sum(conf_matrix)
print(f”Cluster Purity: {purity:.2f}”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the process for reducing dimensionality using PCA in sklearn?

A

First scale the data as PCA is sensitive to input data.

Then initialize pca object, with # of desired components

pca = PCA(n_components)
x_pca = pca.fit_transform(x_scaled)

now x_scaled should have your desired component #

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the process for implementing a decision tree in sklearn?

A

Once you have your dependant and independant variables,

split train and test sets

Initialize and train classifier

clf = DecisionTreeClassifier()

clf.fit(x_train, y_train)

Predict:

y_pred = clf.predict(x_test)

Thats the main process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the process for implementing a random forest classifier in sklearn?

A

Once you have your dependant and independant variables,

split train and test sets

Initialize and train classifier

clf = RandomForestClassifier()

clf.fit(x_train, y_train)

Predict:

y_pred = clf.predict(x_test)

Thats the main process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the process for implementing a SVM classifier in sklearn?

A

Once you have your dependant and indpendant variables,

Scale your independant variable, as svm’s are sensitive to scale of data

Instantiate the model and train it

svm = SVC(kernel = “linear”)
svm.fit(x_train_scaled, y_train)

Make predictions:

y_pred = svm.predict(y_test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly