KNN Flashcards
k-NN process overview
Goal
given a set of labeled items, automatically label a new item
Idea
Consider most similar other items (defined in terms of their attributes), look at their labels and give the unassigned item the majority votes. Ties broken randomly.
To automate knn, what two decisions need to be made
- How to define similarity?
- How many should vote? (what is k?)
Euclidean distance
Cosine similarity
Jaccard distance
Hamming distance
Manhatan distance
Regarding distance metrics…what if attributes are a mixture of kinds of data?
Define your own custom designed metric
synonymous terms
Evaluation metrics
- Accuracy
- Precision
- Recall
- F-score
Evaluation Metric : Accuracy
number of correct labels / (total number of labels)
Evaluation Metric : Precision
number of true positives /
(number of true positives + number of false positives)
Evaluation Metric : Recall
Number of true positives /
(number of true positives + number of false negatives)