Chapter 7 Quiz Flashcards

1
Q

model that finds similar records in training data and then derives the classification/prediction for the new record from voting/averaging

A

k nearest neighbor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what type of model is k nearest neighbor?

A

clear box, non-parametric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

most popular measure of distance between records based on their predictor values

A

euclidean distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how to calculate euclidean distance?

A

square root of x-u squared, add them all together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

when should predictors be standardized?

A

only in training set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

k=1, look for closest record and classify record as belonging to same class as closest neighbor

A

1 nearest neighbor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does choosing k>1 do?

A

provides smoothing, reduces risk of over fitting because of noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what should a k value be?

A

between 1-20
odd number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the ideal k?

A

minimizes misclassification rate in the validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

new records are classified as a member of the majority class of its k neighbors

A

majority rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

source data, map features, standard partition, rescale continuous data, k nearest neighbor, score

A

workflow of k nearest neighbor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

records are grouped into buckets so that records in each are close to each other

A

bucketing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

expected distance to the nearest neighbor goes up dramatically with p unless the size of the training set increases exponentially with p

A

curse of dimensionality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is k?

A

number of neighbors that combine to take a vote, hyperparameter, set by validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does small and large k cause?

A

high variance and naive decision rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

preference towards simpler models