Machine Learning Flashcards
to cover ML models and vocabulary I need to know
Mamba
- selective state space model architecture
State-space models
Models that describe the relationship between some hidden (unknown) variables and their observed measurements. They help us analyze time series problems that involve dynamical systems, and are widely used in statistics, econometrics, engineering, computer science, and finance.
Ex: GPS measures time-of-arrival of signals from satellites and uses it to infer two hidden variables: position and velocity
State equation and measurement equation
Two important equations in state-space models. State equation describes the development of the hidden variable over time. Measurement equation describes the relationship between the measurement (observed variable) and the state (hidden variable). The variables in these equations can be scalars or vectors.
Kalman gain
in state-space modeling, Kalman gain is the weight given to the measurements vs. the model’s current-state estimate. It can be “tuned” to optimize performance.
Kalman Filter
The Kalman Filter is an algorithm that merges noisy measurements with a predictive model to estimate the state of a system over time. It involves two primary steps: prediction, using the state transition matrix (F) and process noise covariance (Q) to forecast the next state and its uncertainty (P); and update, where the prediction is refined using new measurements and their uncertainty (R), adjusted by the Kalman Gain (K). This process iteratively refines the state estimate (x) and its uncertainty (P), making it essential for real-time estimation in systems with uncertainty, like navigation and tracking.
RNN (Recurrent Neural Network) 4 principle structures
one-to-one; one-to-many; many-to-one; many-to-many (non-matching or matching)
Geometric Deep Learning
Umbrella term for approaches considering a broad class of LM problems from the perspectives of symmetry and invariance. It provides a common blueprint allowing to derive from first principle neural network architectures as diverse as CNNs, GNNs, and Transformers. Physical measurements can have low-dimensional geometries (e.g. grids in images, sequences in time-series, position and momentum in molecules) and associated symmetries (e.g. translation, rotation)
Representation learning/feature learning
SE(3)
Special Euclidean Group in 3 dimensions; the group of all possible simultaneous rotations and translations for a vector. SE(3) is often used as a mathematical framework to model the complex spatial arrangements of proteins. SE(3) invariance (where features remain unchanged under transformations) and equivariance (where features change in a predictable way under transformations) are important features.
Invariance and Equivariance
Invariance: I want the output to stay constant no matter how I transform the input
Equivariance: I want the output to undergo exactly the same transformation as applied to the input
Zero-shot learning
Zero-shot learning involves training a model in such a way that it can perform tasks or make predictions on data it has never seen during training. It learns abstract representations that can generalize to new, unseen tasks. This is typically achieved by training the model on a diverse set of tasks or data and using techniques that encourage the learning of generalizable features.
Image classification example: in traditional machine learning, a model is trained to classify images of animals it has seen during training like cats, dogs, and birds. It can accurately identify these animals in the test set, but fails to recognize an animal like a zebra that wasn’t in the training set. In zero-shot design, the model could classify even animals not seen in training like a zebra. This is possible because it learns higher-level features (e.g. stripes, four legs) that can generalize beyond the training set.
Translation example: in traditional ML, you may train a model for English-to-French and French-to-English translations. This model cannot to English-German translation. A zero-shot model could do cross-lingual translation. If trained on English-French and English-German, it may be able to translate between French and German by understanding the abstract linguistic concepts.
Autoencoders (AEs)
A type of feedforward neural network designed to reconstruct the input data through an encoder-decoder mechanism, with a bottleneck layer in between that captures the essential features of the data
Denoising Autoencoders (DAEs)
Autoencoders that introduce noise to the input data during the training process (before it’s passed through the encoder). The decoder attempts to reconstruct the original data from the noisy representation, and minimize the reconstruction error between the corrupted input and the output.
The primary objective is to learn robust data representation by forcing the network to reconstruct the original, clean data from noisy versions of it. This process encourages the auto encoder to capture meaningful and salient features while filtering out the noise, resulting in a more generalizable and informative representation.
Variational Autoencoder (VAE)
VAE is an auto encoder whose encoding distribution is regularized during training to ensure that its latent space has good properties, allowing us to generate some new data. The term “variational” comes from the close relation between the regularization and the variational inference method in statistics.
Dimensionality reduction
The process of reducing the number of features that describe some data. Can be done by:
- selection (only some existing features are conserved)
- extraction (a reduced number of new features are created based on old features)
Useful for data visualization, data storage, heavy computation…