basic definitions Flashcards
what is unsupervised learning
algorithm is trained on unlabeled data
Clustering
grouping similar data points together
Supervised learning
algorithm is trained on a labeled dataset
Reinforcement Learning
A type of machine learning where an agent interacts with an environment by performing actions and receives rewards or penalties. The agent’s goal is to learn a policy that maximizes the cumulative reward over time
Overfitting
A situation where a machine learning model performs well on the training data but poorly on unseen test data because it has learned noise and irrelevant patterns in the training data.
Underfitting
A situation where a machine learning model is too simple to capture the underlying pattern of the data, resulting in poor performance on both the training and test datasets.
Cross-Validation
A technique for evaluating a machine learning model by dividing the dataset into multiple subsets, training the model on some subsets, and validating it on the remaining subset. This process is repeated several times to ensure that the model’s performance is robust
Hyperparameters
Parameters whose values are set before the learning process begins and control the behavior of the learning algorithm. Unlike model parameters, hyperparameters are not learned from the data.
Neural Network
A machine learning model inspired by the human brain’s structure, consisting of layers of interconnected nodes (neurons) that process input data to produce output
Gradient Descent
An optimization algorithm used to minimize the loss function in machine learning models by iteratively adjusting the model parameters in the direction that reduces the error.
Feature Engineering
The process of selecting, modifying, or creating new features from raw data to improve the performance of a machine learning model
Confusion Matrix
A table used to evaluate the performance of a classification model, showing the actual vs. predicted classifications, including true positives, false positives, true negatives, and false negatives
Nearest Neighbors (k-NN)
Nearest Neighbors is an instance-based learning method where the classification of a new instance is determined by the majority vote of its ‘k’ closest neighbors from the training dataset. The distance between instances is typically measured using metrics such as Euclidean distance, Manhattan distance, or others depending on the feature types
knn Lazy learning
k-NN is a lazy learner, meaning it doesn’t learn a model explicitly but rather computes results based on the distance from the query point to the stored instances.
knn Decision Surface
The decision boundary of k-NN is often very irregular and heavily influenced by the choice of ‘k’.