Chapter 7 Quiz Flashcards
model that finds similar records in training data and then derives the classification/prediction for the new record from voting/averaging
k nearest neighbor
what type of model is k nearest neighbor?
clear box, non-parametric
most popular measure of distance between records based on their predictor values
euclidean distance
how to calculate euclidean distance?
square root of x-u squared, add them all together
when should predictors be standardized?
only in training set
k=1, look for closest record and classify record as belonging to same class as closest neighbor
1 nearest neighbor
what does choosing k>1 do?
provides smoothing, reduces risk of over fitting because of noise
what should a k value be?
between 1-20
odd number
what is the ideal k?
minimizes misclassification rate in the validation set
new records are classified as a member of the majority class of its k neighbors
majority rule
source data, map features, standard partition, rescale continuous data, k nearest neighbor, score
workflow of k nearest neighbor
records are grouped into buckets so that records in each are close to each other
bucketing
expected distance to the nearest neighbor goes up dramatically with p unless the size of the training set increases exponentially with p
curse of dimensionality
what is k?
number of neighbors that combine to take a vote, hyperparameter, set by validation set
what does small and large k cause?
high variance and naive decision rule
preference towards simpler models
parsimony