Machine Learning Flashcards
Logistic Regression
Given a data point, we calculate the probability of that data point belonging to a particular label.
from sklearn import datasets, metrics
from sklearn.linear_model import LogisticRegression
mnist=datasets.load_digits() images=mnist.images data_size=len(images) #Reshape to 1D array images=images.reshape(len(images),-1) labels=mnist.target #Initialize LR LR_classifier=LogisticRegression(C=0.01, penalty='l1', tol=0.01) #Train on 75% LR_classifier.fit(images[:int((data_size/4)*3)], labels[:int((data_size/4)*3)]) #Testing predictions=LR_classifier.predict(images[int((data_size/4)):]) target=labels[int((data_size/4)):]
print(“Performance: %s” % (metrics.classification_report(target, prediction)
Support vector machines
Used in supervised ML. Try to find hyperplanes that divide the given data into regions, with each region representing a particular label. Perform v well with high dimensional data.
from sklearn import metrics, svm
images = images.reshape(len(images),-1)
labels=mnist.target.
SVM_classifier=svm.SVC(gamma=0.001)
SVM_classifier.fit(images, labels)
predictions=SVM_classifier.predict(test_images)
K-means clustering
A type of semi-supervised or unsupervised ML, works with partially labeled or unlabeled data. A clustering algorithm, which tries to form clusters of data points based on a similarity function. Form k clusters using the given data points based on a similarity metric e.g. distance between two points in the given space.