Week 3 - Supervised learning: Instance based learning Flashcards
When presented with an input x that
is not in the database: (2)
- Find nearest x in the database.
- Output associated y.
- What if no xs are close?
- We find k nearest xs!
k-Nearest Neighbors (k-NN) algorithm
- Training (2)
You have a dataset of houses with features like size, number of bedrooms, and price.
The algorithm “learns” by storing all these instances in memory.
k-Nearest Neighbors (k-NN) algorithm
- Prediction
When you want to predict the price of a new house, the algorithm finds the k houses in the training set that are closest in terms of features.
It then averages the prices of these k houses to predict the price for the new house.
k-Nearest Neighbors (k-NN) algorithm
- Example in action
If the new house is similar in size, bedroom count, and other features to houses A, B, and C from the training set, and they have prices $200,000, $210,000, and $205,000 respectively, the algorithm might predict the new house’s price to be around $205,000.
k-Nearest Neighbors (k-NN) algorithm:
Training data D = {xi, yi}*
D ={(xi ,yi )}
D represents the training dataset, where each (xi ,yi ) pair consists of an:
input instance xi
its corresponding output or label yi .
k-Nearest Neighbors (k-NN) algorithm:
Number of neighbors = k*
k represents the number of nearest neighbours to consider when making a prediction for a new instance.
For example, if k=5, the algorithm will look at the 5 training instances that are closest to the query point.
k-Nearest Neighbors (k-NN) algorithm:
Query point = q*
k-Nearest Neighbors (k-NN) algorithm:
Query point = q*
q is the instance for which you want to make a prediction.
It’s the input for which you’re trying to find the closest neighbuors in the training set.
k-Nearest Neighbors (k-NN) algorithm:
Distance metric = d(q, x)*
d(q,x) represents the distance metric used to measure the similarity between the query point q and a training instance x.
Common distance metrics include Euclidean distance, Manhattan distance, or other similarity measures depending on the problem.
k-Nearest Neighbors (k-NN) algorithm:
NN = {i: d(q, xi) k smallest}*
NN={i:d(q,xi ) k smallest}
NN represents the set of indices i corresponding to the k training instances with the smallest distances to the query point q.
These are the k nearest neighbors.
k-Nearest Neighbors (k-NN) algorithm:
Classification (Plurality vote of the yi ∈ NN)
Classification involves predicting the class label based on the most common class among the k nearest neighbors.
For a classification task, where the goal is to categorize input instances into different classes or categories.
“yi” represents the class labels of the k nearest neighbors (NN) of the new instance.
The algorithm looks at the class labels of these neighbors and predicts the class label for the new instance based on the most common (plurality) class among them.
k-Nearest Neighbors (k-NN) algorithm:
Regression (Arithmetic mean of the yi ∈ NN)
Regression involves predicting a numerical value based on the average of the target values among the k nearest neighbors.
For a regression task, where the goal is to predict a continuous numerical value.
“yi” represents the target values (e.g., house prices) of the k nearest neighbors (NN) of the new instance.
The algorithm calculates the average (arithmetic mean) of these target values and predicts this average as the numerical value for the new instance.