ML Algorithm Implementations/Data manipulation Flashcards
Understand and internalize major ML algorithms implementations in scikit-learn. With data manipulation for data preprocessing.
What is the process of implementing Linear Regression with sklearn?
- Given the dataset, Identify independant and dependant variables
- Do EDA and Data preprocessing
- Reshape x and y to be in right format:
a.. x = x.reshape(-1,1)
y = y.reshape(-1,1) - split your train, and test set:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size)
- instantiate and train model
model = LinearRegression()
model.fit(x_train,y_train)
- Get your predictions and loss values to evaluate your model
y_pred = model.predict(x_test)
mae = mean_absolute_error(y_true = y_test,y_pred = y_pred)
mse = mean_squared_error(y_true = y_test,y_pred = y_pred)
rmse = mean_squared_error(y_true = y_test,y_pred = y_pred, squared=False)
What libraries are needed to be imported for most ML algorithms
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.preprocessing StandardScaler
What are the regression algorithm import lines
from sklearn.linear_model import LinearRegression, LogisticRegression
Where do you import PolynomialFeatures?
from sklearn.preprocessing import PolynomialFeatures
What are the import lines for KMeans?
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import adjusted_rand_score
import numpy as np
What is the process of implementing Logistic Regression with sklearn
Identify Target and independant variables: (x,y = data.data, data.target)
Split your train and test data: (train_test_split(x,y,test_size))
scale your independant variables in both train and test data :
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
instantiate and train your model:
model = LogisticRegression()
model.fit(x_train,y_train))
Evaluate model: model.score(x_test,y_test)
What is the process of implementing Polynomial Regression
Essentially the same process as linear regression, Identify what your independant and dependant variables are, split them and reshape them.
This time, instantiate a poly_features = PolynomialFeatures(degree,include_bias)
then pass through your x through the poly_features, by doing poly_features.fit_transform(x)
now in the rest of your training and calculations, use the polynomial transformed x
y stays the same
What is the process of implementing Kmeans
First get your dataset and ensure it is a classification problem
Second, Identify your dependant and independant variables and save them as such
iris = load_iris(as_frame=True)
x = iris.data
y = iris.target
third, scale your independant variables
scaler = StandardScaler()
x_scaled = scaler.fit_transform(x)
fourth, instantiate your model, then fit your model
kmeans = KMeans(n_clusters=3)
kmeans.fit(x_scaled)
labels = kmeans.labels_
fifth, visualize your data if need be, and evaluate your model using
a confusion matrix of a Rand Index
Compute the confusion matrix
conf_matrix = confusion_matrix(true_labels, labels)
Calculate purity
purity = np.sum(np.amax(conf_matrix, axis=0)) / np.sum(conf_matrix)
print(f”Cluster Purity: {purity:.2f}”)
What is the process for reducing dimensionality using PCA in sklearn?
First scale the data as PCA is sensitive to input data.
Then initialize pca object, with # of desired components
pca = PCA(n_components)
x_pca = pca.fit_transform(x_scaled)
now x_scaled should have your desired component #
What is the process for implementing a decision tree in sklearn?
Once you have your dependant and independant variables,
split train and test sets
Initialize and train classifier
clf = DecisionTreeClassifier()
clf.fit(x_train, y_train)
Predict:
y_pred = clf.predict(x_test)
Thats the main process
What is the process for implementing a random forest classifier in sklearn?
Once you have your dependant and independant variables,
split train and test sets
Initialize and train classifier
clf = RandomForestClassifier()
clf.fit(x_train, y_train)
Predict:
y_pred = clf.predict(x_test)
Thats the main process
What is the process for implementing a SVM classifier in sklearn?
Once you have your dependant and indpendant variables,
Scale your independant variable, as svm’s are sensitive to scale of data
Instantiate the model and train it
svm = SVC(kernel = “linear”)
svm.fit(x_train_scaled, y_train)
Make predictions:
y_pred = svm.predict(y_test)