Chapter 7- Distance Based Methods Flashcards
give the k nearest neighbour algorithm
for each test sample x:
find the k most similar training examples
predict x to be whatever the most common label is among those k
what is hamming distance?
a distance method for categorical data
how is hamming distance calculated?
counts the non equal entries
what is the main disadvantage of KNN?
computationally intensive
give two modifications to knn
condensed knn
octree data structure
what feature properties can severely affect knn (2)?
scaling of the features
presence of irrelevant features
A large value of k in a KNN model is likely to …fit
under
what does the triangle inequality say
the sum of any two sides of a triangle is greater than or equal to the third side
give the triangle inequality
|| x+y || <= ||x|| + ||y||
three problems with distance based methods
Tractability - each point requires a distance calculation and then ordering
Scaling - feature scales make a big difference
Does distance make sense?