K Nearest Flashcards
K Nearest
Store the traaining records only, no model computed.
Simplest form of learning
Lazy evalutaion
Accurate but Slow
Elements
Training Dataset
Similarity Function
Value k of number neighbors
Similarity Measure
Euclidean = sqrt(sum(pi-qi)^2)
L-norm = sum(|xi-yi|^r)^(1/r)
Manhattan = sum(|xi-yi|)
Linf-norm = max|xi-yi|
Cosine dist = arcccos(sqrt(xiyi)/(sqrt(xi^2)sqrt(yi)^2)) [0,180]
Jaccard Distance = 1-J(x,y) J(x,y)=|x inters y|/|x u y|
Hamming distance = n - m / n m = matching components
KD-Trees
Split the space hierarchically using a tree
Split point close to the mean
Inneeficient when number of attributes too large
Ball Trees
Hyperspheres
K-nearest neighbor Regression
Other regression models are parametric
Approximation function with weight vector
Nearest Neighbor fits each point locally
Given dataset (X1,Y1),(X2,Y2)….(Xn,Yn) Given query point Xq Yq computed as local interpolation of neighbor