Chapter 7 Flashcards

1
Q

k-nearest neighbors

A

Relies on finding “similar” records in training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

These Neighbors are used to derive a classification or prediction

A

voting (for classification) or averaging (for prediction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how it works

A

Identify k records in the training data that are similar to a new record that we wish to classify

Then use the similar (neighboring) records to classify the new record into a class, assigning the new record to the predominant class among these neighbors

Look for records in the training data that are similar or “near” the record to be classified

Based on the class (of the proximate records), we assign a class to the new record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

“Near” means

A

records with similar predictor values X1, X2, … Xp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

characteristics of the model

A

Data-driven, not model-driven

Makes no assumptions about the form of the relationship between the class membership Y and the predictors X1, X2, … Xp

Nonparametric method because it does not involve estimation of parameters (coefficients) in an assumed function form, such as the linear form assumed in linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to measure “nearby”?

A

Central issue: How to measure distance between records based on their predictor values.

The most popular distance measure: Euclidean distance

The Euclidean distance between two records
(x1, x2, … xp) and (u1, u2… up) is:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Misclassification error of the one-nearest-neighbor scheme has a misclassification rate

A

that is no more than twice the error rate when we know exactly the probability density functions for each class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

one-nearest-neighbor idea can be extended to k > 1 as follows

A

Find the nearest k neighbors to be classified

Use a majority decision rule to classify the record, where the record is classified as a member of the majority class of the k neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Choosing k Advantage of choosing k>1

A

; Higher values of k provide smoothing that reduces the risk of overfitting

Balance between overfitting and ignoring predictor information
Typically values of k fall into a range of 1-20 (odd number to avoid ties)

Typically choose that value of k which has lowest error rate in validation data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

k too low:

A

may be fitting the noise of the data

capture local structure in data (but also noise)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

k too high

A

: may not capture the local structure of the data (a key advantage)
more smoothing, less noise, but may miss local structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Extreme k=

A
number of records:  assign all records to the majority class
The naïve rule (a case of oversmoothing)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Advantages

A

Simple
No assumptions required about Normal distribution
Effective at capturing complex interactions among variables without having to define a statistical model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Shortcomings

A

takes a long time to find distances to all the neighbors and then identify the nearest one(s)
Required size of training set increases exponentially with # of predictors, p
“curse of dimensionality”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

“curse of dimensionality”

A

Reduce dimension of predictors (e.g., with PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nonparametric method

A

because it does not involve estimation of parameters (coefficients) in an assumed function form, such as the linear form assumed in linear regression