Machine Learning Flashcards
Cosine Similarity
Measures the cosine of the angle between two vectors to determine the similarity between two items.
Manhattan Distance
Calculates the distance between points in a grid-based layout as the sum of the absolute differences of their Cartesian coordinates.
Jaccard Similarity
Compares the similarity and diversity of sample sets, calculating the size of the intersection divided by the size of the union of the sets.
Spearman’s Rank Correlation
A measure of rank correlation that assesses how well the relationship between two variables can be described using a monotonic function.
K-Nearest Neighbors (KNN)
A classification algorithm that stores all cases and classifies new cases based on a majority vote of its k nearest neighbors.
Matrix Factorization
A collaborative filtering technique using decompositions like SVD to predict missing entries in a user-item interaction matrix.
Content-Based Filtering
Recommends items based on their similarity to items previously liked by the user, using the features of the items themselves.
Cold Start Problem
A challenge in recommendation systems where there is insufficient data on new users or items to make accurate recommendations.
Item-to-Item Collaborative Filtering
A form of collaborative filtering based on calculating the similarity between items using ratings given by users.
Hamming Distance
Measures the distance between two strings of equal length by counting the number of positions at which the corresponding symbols differ.
Supervised Learning
A type of machine learning where the model is trained on a labeled dataset, learning to predict the output from the input data.
Unsupervised Learning
Learning from data that has not been labeled, categorized, or classified, aiming to identify significant patterns.
Regression
A statistical method used in machine learning for predicting continuous outcomes based on previous data.
Classification
A process in machine learning for categorizing data into predefined classes or categories.
Decision Trees
A decision support tool that uses a tree-like model of decisions and their possible consequences or probability event outcomes.