Part 2. K Nearest neigbors Flashcards
Name examples for using K nearest neigbors
Abnormalities
Email spam/not spam
Classifying credit card
What is a training set?
In a training set each record contains a set of attributes, one of the attributes is the class.
What is a validation set?
Used to determine the accuracy of the model.
What is usually the case with the training and validation set?
Training set is used to build the model and validation set used to validate it.
What does a Nearest-Neighbor Classifier require?
Set of stored records
Distance metric to compute distance between records
value of K.
How is a unkown record classified?
Compute distance to other training records.
Identify K nearest neighbors
Use class labels of nearest neighbors to determine the class labels of uknown record.
What does a small distance imply?
Discriminating attributes are equal and that also
implies that they probably are in the same class.
Choosing the value of K
If K is too small, it is sensitive to noise points.
If K is too large, the neighborhood may include points from other classes
How to find the weights?
With gradient decent.
How to K?
Try many different values of K and look for optimum.
What is curse of dimensionality?
In high dimensions (many attributes) everything is far. There are no points nearby. Unless a large number of data is available.