Week 10 - KNN Flashcards
What are features?
Features are individual measurable properties or characteristics of data used in machine learning models to make predictions or classifications. They represent the input variables that the model uses to learn patterns and relationships within the data.
Give an example of how the KNN algorithm can be used.
An example of using the KNN algorithm is classifying a new fruit based on its size and color. By plotting known fruits like grapefruits and oranges on a graph with these attributes, you can classify a new, mysterious fruit by finding its closest neighbors in the dataset. If the majority of these nearest neighbors are oranges, the KNN algorithm would classify the new fruit as an orange.
What is KNN?
The k-nearest neighbors (KNN) algorithm is a classification method that assigns a label to a new data point based on the majority label of its k closest neighbors in the feature space. It classifies data by comparing the distance between the new point and existing points in the dataset.
How can the distance between two points be calculated in relation to KNN?
To use KNN, you need to determine what “nearest” means, typically by calculating Euclidean distance. This distance metric measures the straight-line distance between two points in the feature space, helping to identify the closest neighbors.
What is the formula for Euclidean distance?
For two points (\mathbf{p} = (p_1, p_2)) and (\mathbf{q} = (q_1, q_2)) in a 2-dimensional space, the Euclidean distance is calculated using the formula:
[ \text{Euclidean Distance} = \sqrt{ (p_1 - q_1)^2 + (p_2 - q_2)^2 } ]
What is KNN for classification?
KNN for classification is a method that assigns a category to a new data point based on the majority vote from its k-nearest neighbors in the feature space. The class that appears most frequently among the closest neighbors determines the classification of the new point.
What is KNN for regression?
KNN for regression predicts a value for a new data point by averaging the target variable values of its k-nearest neighbors. The predicted value is based on the average of the target values from the closest data points.
What is the difference between KNN for classification and KNN for regression?
The difference between KNN for classification and KNN for regression lies in their objectives:
KNN for Classification: Assigns a class label based on the majority vote of the k-nearest neighbors.
KNN for Regression: Predicts a continuous value by averaging the target values of the k-nearest neighbors.
What is the importance of feature selection for KNN?
Feature selection is crucial for KNN because the choice of features directly affects the algorithm’s performance. Relevant features ensure accurate predictions, while irrelevant or biased features can lead to poor results. Properly selecting features helps in building a model that reflects true preferences or behaviors, enhancing the effectiveness of the KNN algorithm.
What does feature extraction involve in data analysis?
Feature extraction involves converting an item, such as a fruit or a user, into a list of numbers. These numerical representations can then be compared and analyzed to identify patterns or make predictions.