Chapter 7 Flashcards

Question 1

Q

k-nearest neighbors

Answer

A

Relies on finding “similar” records in training data

Question 2

Q

These Neighbors are used to derive a classification or prediction

Answer

A

voting (for classification) or averaging (for prediction)

Question 3

Q

how it works

Answer

A

Identify k records in the training data that are similar to a new record that we wish to classify

Then use the similar (neighboring) records to classify the new record into a class, assigning the new record to the predominant class among these neighbors

Look for records in the training data that are similar or “near” the record to be classified

Based on the class (of the proximate records), we assign a class to the new record

Question 4

Q

“Near” means

Answer

A

records with similar predictor values X1, X2, … Xp

Question 5

Q

characteristics of the model

Answer

A

Data-driven, not model-driven

Makes no assumptions about the form of the relationship between the class membership Y and the predictors X1, X2, … Xp

Nonparametric method because it does not involve estimation of parameters (coefficients) in an assumed function form, such as the linear form assumed in linear regression

Question 6

Q

How to measure “nearby”?

Answer

A

Central issue: How to measure distance between records based on their predictor values.

The most popular distance measure: Euclidean distance

The Euclidean distance between two records
(x1, x2, … xp) and (u1, u2… up) is:

Question 7

Q

Misclassification error of the one-nearest-neighbor scheme has a misclassification rate

Answer

A

that is no more than twice the error rate when we know exactly the probability density functions for each class

Question 8

Q

one-nearest-neighbor idea can be extended to k > 1 as follows

Answer

A

Find the nearest k neighbors to be classified

Use a majority decision rule to classify the record, where the record is classified as a member of the majority class of the k neighbors

Question 9

Q

Choosing k Advantage of choosing k>1

Answer

A

; Higher values of k provide smoothing that reduces the risk of overfitting

Balance between overfitting and ignoring predictor information
Typically values of k fall into a range of 1-20 (odd number to avoid ties)

Typically choose that value of k which has lowest error rate in validation data

Question 10

Q

k too low:

Answer

A

may be fitting the noise of the data

capture local structure in data (but also noise)

Question 11

Q

k too high

Answer

A

: may not capture the local structure of the data (a key advantage)
more smoothing, less noise, but may miss local structure

Question 12

Q

Extreme k=

Answer

A

number of records:  assign all records to the majority class
The naïve rule (a case of oversmoothing)

Question 13

Q

Advantages

Answer

A

Simple
No assumptions required about Normal distribution
Effective at capturing complex interactions among variables without having to define a statistical model

Question 14

Q

Shortcomings

Answer

A

takes a long time to find distances to all the neighbors and then identify the nearest one(s)
Required size of training set increases exponentially with # of predictors, p
“curse of dimensionality”

Question 15

Q

“curse of dimensionality”

Answer

A

Reduce dimension of predictors (e.g., with PCA

Question 16

Q

Nonparametric method

Answer

A

because it does not involve estimation of parameters (coefficients) in an assumed function form, such as the linear form assumed in linear regression