Chapter 7 Quiz Flashcards

Question 1

Q

model that finds similar records in training data and then derives the classification/prediction for the new record from voting/averaging

Answer

A

k nearest neighbor

Question 2

Q

what type of model is k nearest neighbor?

Answer

A

clear box, non-parametric

Question 3

Q

most popular measure of distance between records based on their predictor values

Answer

A

euclidean distance

Question 4

Q

how to calculate euclidean distance?

Answer

A

square root of x-u squared, add them all together

Question 5

Q

when should predictors be standardized?

Answer

A

only in training set

Question 6

Q

k=1, look for closest record and classify record as belonging to same class as closest neighbor

Answer

A

1 nearest neighbor

Question 7

Q

what does choosing k>1 do?

Answer

A

provides smoothing, reduces risk of over fitting because of noise

Question 8

Q

what should a k value be?

Answer

A

between 1-20
odd number

Question 9

Q

what is the ideal k?

Answer

A

minimizes misclassification rate in the validation set

Question 10

Q

new records are classified as a member of the majority class of its k neighbors

Answer

A

majority rule

Question 11

Q

source data, map features, standard partition, rescale continuous data, k nearest neighbor, score

Answer

A

workflow of k nearest neighbor

Question 12

Q

records are grouped into buckets so that records in each are close to each other

Answer

A

bucketing

Question 13

Q

expected distance to the nearest neighbor goes up dramatically with p unless the size of the training set increases exponentially with p

Answer

A

curse of dimensionality

Question 14

Q

what is k?

Answer

A

number of neighbors that combine to take a vote, hyperparameter, set by validation set

Question 15

Q

what does small and large k cause?

Answer

A

high variance and naive decision rule

Question 16

Q

preference towards simpler models

Answer

A

parsimony