Chapter 2: K Nearest Neighbours Flashcards
give the KNN classification rule
get distances (xte, xtr) sort select k nearest assign yte as most common class
what are two distance measures
Euclidean distance
minkowski distance
explain Euclidean distance
|| p - q || 2
the square root or the difference between each vector squared
explain minkowski distance
[ sum to d | pi - qi | ^ t ] ^ 1/t
if t = 2, same as Euclidean distance
if t = 1, city block
give two similarity measures
inner product
cosine
describe inner product
sum to d piqi
describe cosine as a similarity measure
sum piqi / ||p||2||q||2
what is the effect of increasing the number of training samples
more accurate
but too many or too much noise will cause overfitting
what is the effect of increasing k
a small k may model noise
a large k will include too many points from other classes
in binary classification, k must be…
odd
give the KNN regression rule
measure the distance (xte, xtr)
sort distances
select k nearest
calculate the average of these
what does it mean that KNN is non parametric
there are no parameters to be optimised
what is the neighbour search algorithm
is B is far from A and C is close to B, then C is far from A
what is instance based learning
output is based on similarity or distance