ML Flashcards
Key classification metrics
Accuracy - number of correct predicitions divided by total number of predictions (useful when target classes are well balanced)
Precision - number of true positives divided by the number of true positives plus false positives
Recall - number of true positives divided by the number of true positives plus false negatives
F1 Score - harmonic mean of precision and recall F1 = 2 * (precision*recall)/(precision + recall) *harmonic mean punishes extreme values
What is type I error?
false positive
What is type II error?
false negative
Confusion matrix in scikit learn
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test,predictions))
How to use a pipeline in scikit-learn
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
text_clf = Pipeline([(‘tfidf’,TfidfVectorizer()),(‘clf’,LinearSVC())])
text_clf.fit(X_train,y_train)
Classification task using Keras
import numpy as np
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
from keras.utils import to_categorical
y = to_categorical(y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state=42)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)
scaled_X_train = scaler.transform(X_train)
scaled_X_test = scaler.transform(X_test)
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model. add(Dense(8,input_dim=4,activation=’relu’))
model. add(Dense(8,input_dim=4,activation=’relu’))
model. add(Dense(3,activation=’softmax’))
model. compile(loss=’categorical_crossentropy’,optimizer=’adam’,metrics=[‘accuracy’])
model.fit(scaled_X_train,y_train,epochs=150,verbose=2)
predictions = model.predict_classes(scaled_X_test)
y_test.argmax(axis=1)
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
confusion_matrix(y_test.argmax(axis=1),predictions)
print(classification_report(y_test.argmax(axis=1),predictions))