General Cards Flashcards by Greg Salmon

Classification

A supervised learning task where the goal is to assign input data points to predefined discrete categories or classes

How well did you know this?

Not at all

Perfectly

Regression

A supervised learning task where the goal is to predict a continuous numerical output value for a given input data point

How well did you know this?

Not at all

Perfectly

Supervised Learning

A machine learning paragdim where an algorithm learns from labeled data (input-output pairs) to map inputs to outputs

How well did you know this?

Not at all

Perfectly

Feature Scaling

A preprocessing step which transforms the range of features to a similar scale, which can be essential for some machine learning models

How well did you know this?

Not at all

Perfectly

Bias-Variance Decomposition

A way to analyze the generalization error of a model by breaking it down into noise, bias squared, and variance

How well did you know this?

Not at all

Perfectly

Generalization

The ability of a trained machine learning model to perform well on new, unseen data

How well did you know this?

Not at all

Perfectly

Overfitting

A phenomenon where a model learns the training data too well, including the noise, leading to poor performance on new data

How well did you know this?

Not at all

Perfectly

Regularization

Techniques used to reduce overfitting by adding a penalty to the model’s loss function, which discourages overly complex models

How well did you know this?

Not at all

Perfectly

L1 Regularization (LASSO)

A type of regularization proportional to the absolute value of the model’s coefficients, often leading to sparse models

How well did you know this?

Not at all

Perfectly

L2 Regularization (Ridge)

A type of regularization that adds a penalty proportional to the squared value of the model’s coefficients, shringing them towards zero

How well did you know this?

Not at all

Perfectly

Support Vector Machine (SVM)

A supervised learning model which aims to find a hyperplane to separate data points with the largest margin between classes

How well did you know this?

Not at all

Perfectly

Margin (SVM)

The distance the separating hyperplane and the closest data points

How well did you know this?

Not at all

Perfectly

Kernel Trick

A technique used in kernel methods, such as SVMs, which implicitly maps data into a higher dimension feature space with kernel functions.

Allows for learning of non-linear decision boundaries without explicitly computing the transformation

How well did you know this?

Not at all

Perfectly

Kernel Function

A function which computes the inner product between two data points in a potentially high-dimensional feature space

How well did you know this?

Not at all

Perfectly

Decision Tree

A tree-like supervised learning model where each internal node represents a test on an attribute, each branch represents the outcome of that test, and each leaf node represents a class label (classification) or prediction (regression)

How well did you know this?

Not at all

Perfectly

Ensemble Learning

A ML technique which combines the predictions of multiple models to improve overall performance and robustness

Bagging (Bootstrap aggregation)

An ensemble learning technique where each model is trained on different bootstrap samples of the training data, and their predictions are aggregated

Boosting

An ensemble learning technique where models are trained sequentially, with each new model attempting to correct the mistakes of the previous models

Random Forest

An ensemble learning method that builds multiple decision trees and merges their predictions to get a more accurate and stable prediction using bootstrapped samples and randomly selected features for each model

Gradient Boosting

A boosting algorithm that builds trees sequentially, where each new tree predicts the residual errors of the previous trees through optimization to minimize the errors

“building a strong learner through the combination of multiple weak learners”

Image Segmentation

The process of partitioning a digital image into multiple segments to simplify the image or convert it into something easier to analyze

Multiple Kernel Learning (MKL)

A machine learning technique which kernels are used to make a model more efficient when using multimodal data

Sparsity

A property of a model where many of it’s parameters are zero, meaning only a subset of features are relevant for making predictions

Structured Sparsity

Sparse model where non-zero parameters exhibit some form of structure or pattern, often based on prior knowledge about the relationships between features

Total Variation (TV) Regularization

A regularization technique often use in image processing that promotes piecewise constant solutions by penalizing the total variation of the signal

Laplacian Regularization

A regularization technique that promotes smooth solutions by penalizing the differences between the values of neighboring features

Principal Component Analysis (PCA)

An unsupervised dimensionality reduction technique that finds orthogonal linear combinations of the original features (principal components) that capture the maximum variance in the data

Unsupervised Learning

A machine learning paragdim where the algorithm learns patterns from unlabeled data without explicit output labels

Clustering

An unsupervised learning task where the goal is to group similar data points together into clusters based on intrinsic properties

K-means

A popular unsupervised clustering algorithm that aims to partition the data into K clusters by iteratively assigning data points to the nearest centroid and updating the centroids

Hierarchical Clustering

An unsupervised clustering algorithm that builds a hierarchy of clusters, either in a bottom-up (agglomerative) or top-down (divisive) manner

Gaussian Mixture Model (GMM)

A probabilistic model that assumes all data points are generated from a mixture of finite gaussian distributions with unknown parameters.

Expectation Maximation (EM) Algorithm

An algorithm commonly used to estimate parameters within a gaussian mixture model

Partial Least Squares (PLS)

A statistical method that aims to find the fundamental relations between two blocks of data by finding directions in each space with the maximum covariance with each other

Canonical Correlation Analysis (CCA)

A multivariate statistical method for finding linear combinations of two sets of variables that are maximally correlated with each other

t-SNE

A non-linear dimensionality reduction technique primarily used for visualizing high-dimensional datasets in a low-dimensional space by preserving the local structure of the data