Machine Learning Flashcards

Question 1

Q

Logistic Regression

Answer

A

Given a data point, we calculate the probability of that data point belonging to a particular label.

from sklearn import datasets, metrics
from sklearn.linear_model import LogisticRegression

mnist=datasets.load_digits()
images=mnist.images
data_size=len(images)
#Reshape to 1D array
images=images.reshape(len(images),-1) 
labels=mnist.target
#Initialize LR
LR_classifier=LogisticRegression(C=0.01, penalty='l1', tol=0.01)
#Train on 75%
LR_classifier.fit(images[:int((data_size/4)*3)], labels[:int((data_size/4)*3)])
#Testing
predictions=LR_classifier.predict(images[int((data_size/4)):])
target=labels[int((data_size/4)):]

print(“Performance: %s” % (metrics.classification_report(target, prediction)

Question 2

Q

Support vector machines

Answer

A

Used in supervised ML. Try to find hyperplanes that divide the given data into regions, with each region representing a particular label. Perform v well with high dimensional data.
from sklearn import metrics, svm
images = images.reshape(len(images),-1)
labels=mnist.target.
SVM_classifier=svm.SVC(gamma=0.001)
SVM_classifier.fit(images, labels)
predictions=SVM_classifier.predict(test_images)

Question 3

Q

K-means clustering

Answer

A

A type of semi-supervised or unsupervised ML, works with partially labeled or unlabeled data. A clustering algorithm, which tries to form clusters of data points based on a similarity function. Form k clusters using the given data points based on a similarity metric e.g. distance between two points in the given space.

Machine Learning Flashcards

(3 cards)