4.2 Instance-based Learning Flashcards
What is the difference between soft and hard margin in SVM
Soft margins allow some data points to violate the
separating hyperplane
Whats the difference between knn and k-means clustering?
KNN represents a supervised classification algorithm that will give new data points accordingly to the k number or the closest data points
k-means clustering is an unsupervised clustering algorithm that gathers and groups data into k number of clusters.
Using nearest neighbor (NN) classifier with K = 1 to predict discrete valued labels, q1 and q2 in Fig. 1 are classified as:
With K = 1, q1 will be predicted as circles and q2 as diamonds since these are nearest data points in the given search space according to 1-NN.
Setting K = 3 for the case of Fig. 1, q1 and q2 would be classified as:
with K = 3, q1 is predicted as circles because it has three neighbours belonging to circles, q2 is predicted as circles because the majority neighbours of it are circles.
Value of K in K-NN is preferably chosen to be an odd value to avoid ties in voting
If you choose an even K, there is a risk of getting tie in labelling a new instance, so K is usually set as an odd value to break ties.
Consider the dataset shown below with real-valued features , we would like to train K-NN (K = 1) and Naïve Bayes on this dataset, with 70% data for training and 30% for testing. Which algorithm would likely to have higher accuracy and why? Check all that applies.
Naïve Bayes is a linear classifier. There is not a line that we could draw to separate positive instances from negative instances for this dataset.
If we split the data in test and training sets, Naïve Bayes would likely to get some test instances wrong, while KNN would perform better as the data is clustered.
With K = 1 in KNN, if the test set includes ALL the instances in one of these clusters, the entire cluster would be misclassified.