Lecture 2 Flashcards
k-Nearest Neighbors (k-NN)
Given a set of labeled instances (training set), new instances (testing set) are classified according to the majority label of the k nearest neighbors
Decision Boundary
A model of the separation between two classes. Can be a straight or wiggly line
What is the complexity of the k-NN model proportional to?
The complexity is proportional to the wigglyness of the decision boundry. The more complex, the more wiggly
What does a model do with data in a classification problem?
In classification, a model trained from data defines a decision boundry that separates the data
What does a model do with data in a regression problem?
In regression, a model fits data to describe the relation between i) 2 features or ii) a feature and the label
What happens when there is an equal number of classes of the nearest neighbors (a tie) or two neighbors are equidistant to the new data point?
Either a random selection of the class assigned to the new point is made or how large k is changes to break either the class or distance tie
What is the label (class) of a point on the decision boundary?
It’s ambiguous
Uniform Weighted k-NN
When the majority class of nearest neighbors determines the class of the new data point
Distance Weighted k-NN
each neighbor has a weight which is based on its distance to the new data point
Inverse Distance Weighted k-NN
each neighbor has a weight which is based on the inverse of its distance to the new data point so closer neighboring points have a higher vote
What are two types of Kernel Functions
Gaussian kernel (bell curve) and tricube kernel
Euclidean distance
A straight line
Manhattan distance
Distance between two projections on the axis (you can’t walk through walls/buildings, you have to go around them)
When does a k-NN model have a danger of overfitting?
When k is too low; the model has a high complexity
When does a k-NN model have a danger of underfitting?
When k is too high; the model has a low complexity