Similarity-Based Learning Flashcards
True or False: k-NN is robust to outliers.
True
True or False: k-NN is always robust to missing values.
False.
k-NN is somewhat robust to missing values if there are not too many missing values across various features.
True or False: The sensitivity of k-NN to noise is dependent on the value of k.
True.
If k is small, k-NN is very sensitive to noise.
Increasing k makes the algorithm less sensitive to noise, but less accurate.
True or False: The sensitivity of k-NN to imbalance data is independent of k.
False.
When k is small, k-NN is less sensitive to imbalance.
When k is large, k-NN is more sensitive to imbalance (as k increases, majority class will overcome minority class).
What is inductive bias in the context of k-NNs? How can inductive bias be mitigated?
Instances close to each other are the same class.
Under sample majority class using Tomek links.
Over sample minority class using SMOTE.
This will resolve the imbalance.
Another method is Shepard’s method (inverse distance weighting). Neighbors are penalized based on distance from test instance.
True or False: There is no training involved in k-NN.
True.
It only involves inducing a model.
Complete: The K in KD trees represents ____ but the K in the KNN algorithm represents ____
K in KD trees represents the number of descriptive features used to represent each instance.
The K in the KNN algorithm represents the number of instances in the test set most similar to the query instance.
True or False: If the k-NN algorithm is applied on a dataset with instances that only have continuous descriptive features and one nominal target feature, the resulting model is a regression model.
False.
The k-NN algorithm applied to a dataset with continuous descriptive features and a nominal (categorical) target feature results in a classification model, not a regression model. In this case, the algorithm predicts the category (or class) of the target feature based on the majority class among the k-nearest neighbors.