KNN Flashcards
DS
Briefly explain how the K-Nearest Neighbors (KNN) algorithm works.
KNN makes a prediction by averaging the k-nearest neighbors to a given data point.
e.g., if we wanted to predict how much money a potential customer would spend at our store, we could find the 5 customers most similar to her and average their spending to make the prediction.
The average would be weighted based on similarity between data points and the similarity, i.e. the “distance”, metric could be modified as well.
Is KNN a parametric or non-parametric algorithm? Is it used as a classifier or for regression?
KNN is non-parametric and can be used for either classification or regression.
How do you select the ideal number of neighbors for KNN?
There is no closed-form solution for calculating k, so various heuristics are often used. It may be easiest to simply do cross validation and test several different values for k and choose the one that produces the smallest error during CV.
As k increases, the KNN becomes less flexible so that variance decreases and bias increases.