Lecture 2 Flashcards

1
Q

k-Nearest Neighbors (k-NN)

A

Given a set of labeled instances (training set), new instances (testing set) are classified according to the majority label of the k nearest neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Decision Boundary

A

A model of the separation between two classes. Can be a straight or wiggly line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the complexity of the k-NN model proportional to?

A

The complexity is proportional to the wigglyness of the decision boundry. The more complex, the more wiggly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a model do with data in a classification problem?

A

In classification, a model trained from data defines a decision boundry that separates the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a model do with data in a regression problem?

A

In regression, a model fits data to describe the relation between i) 2 features or ii) a feature and the label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens when there is an equal number of classes of the nearest neighbors (a tie) or two neighbors are equidistant to the new data point?

A

Either a random selection of the class assigned to the new point is made or how large k is changes to break either the class or distance tie

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the label (class) of a point on the decision boundary?

A

It’s ambiguous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Uniform Weighted k-NN

A

When the majority class of nearest neighbors determines the class of the new data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Distance Weighted k-NN

A

each neighbor has a weight which is based on its distance to the new data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Inverse Distance Weighted k-NN

A

each neighbor has a weight which is based on the inverse of its distance to the new data point so closer neighboring points have a higher vote

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are two types of Kernel Functions

A

Gaussian kernel (bell curve) and tricube kernel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Euclidean distance

A

A straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Manhattan distance

A

Distance between two projections on the axis (you can’t walk through walls/buildings, you have to go around them)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When does a k-NN model have a danger of overfitting?

A

When k is too low; the model has a high complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When does a k-NN model have a danger of underfitting?

A

When k is too high; the model has a low complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you determine the model complexity?

A

Depends on the complexity of the separation between classes. Start with the simplest model (large k) and increase complexity (smaller k)

17
Q

How do you choose k?

A

Typically an odd number k for an even number of classes. The data-miner’s rule of thumb is k = sqrt(n)

18
Q

Nearest centroid classification

A
Takes a new sample, and compares it to each of these class centroids. The class 
whose centroid it is closest to, in squared distance, is the predicted class for that new 
sample.
19
Q

Nearest shrunken centroid classification

A

“shrinks” each of the class centroids toward the overall centroid for all classes by an amount called the threshold . This shrinkage consists of moving the centroid towards zero by threshold, setting it equal to zero if it hits zero.

After shrinking the centroids, the new sample is classified by the usual nearest centroid rule, but using the shrunken class centroids.

20
Q

k-NN Advantages

A

• The cost of the learning process is zero
• No assumptions about the characteristics of the concepts to learn have
to be done
• Complex concepts can be learned by local approximation using simple
procedures

21
Q

k-NN Disadvantages

A

•The model can not be interpreted (there is no description of
the learned concepts)
•It is computationally expensive to find the k nearest neighbors
when the dataset is very large
•Performance depends on the number of dimensions that we have (curse of dimensionality)

22
Q

Curse of Dimensionality

A

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience.

For example: when the dimensionality increases, the volume of the space increases so fast that the available data become sparse.

23
Q

When does the k-NN algorithm require more computation?

A

The k-NN algorithm requires more computation for testing than for training.

24
Q

What does training the kNN algorithm consist of?

A

only storing the data

25
Q

What does testing the kNN algorithm consist of?

A

Testing involves comparing every test instance to all the instances in the training set and calculating which training instances are closest, before assigning a class label.

26
Q

Is the kNN algorithm used for classification or regression?

A

It can be used for both classification and regression.

27
Q

What is the relationship between k and the complexity of the model?

A

As you increase k, the model gets less complex (risk of overfitting decreases).

28
Q

If the model preforms well of the training data but poorly on the test data, what’s the issue?

A

The model is overfitting

29
Q

Can k in kNN be negative or a float?

A

No, k is always a positive integer