5: Machine Learning Basics Flashcards
Machine Learning Algorithm
An algorithm that is able to learn from data
Learning (definition)
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E
Central challenge in ML
ML algorithms must perform well on new, previously unseen inputs (not just the ones used for training the model)
Most common ML tasks include
Classification, Classification with missing inputs, Regression, Transcription, Machine translation, Structured output, Anomaly detection, Synthesis and sampling, Imputation of missing values, Denoising, and Density estimation/probability mass function estimation
Classification
The computer program is asked to specify which of k categories some input belongs to (other variants output a probability distribution over the classes)
Classification with missing inputs
Like classification, but with some inputs missing (useful for medical diagnosis). Instead of a single classification function, the algorithm must learn a set of functions, each corresponding to classifying x with a different subset of inputs missing. This can be done efficiently if the model learns only a single function describing the joint probability distribution over all the relevant variables.
Regression
The computer program is asked to predict a numerical value given some input
Transcription
The ML system is asked to observe a relatively unstructured representation of some kind of data and transcribe the information into discrete textual form (e.g. speech recognition, optical character recognition)
Machine translation
The input already consists of a sequence of symbols in some language, and the computer program must convert this into a sequence of symbols in another language (e.g. English to French, or decompilation!)
Structured output
Involve any task where the output is a vector with important relationships between the different elements. This is a broad category, and translation and transcription tasks fall within it, along with others including parsing or annotating photos.
Anomaly detection
The computer program sifts through a set of events or objects and flags some of them as being unusual (e.g. credit card fraud detection, intrusion detection systems)
Synthesis and sampling
The ML algorithm is asked to generate new examples that are similar to those in the training data (e.g. generating video game textures, speech synthesis)
Imputation of missing values
The ML algorithm is given a new example x, but with some entries x_i of x missing. The algorithm must provide a prediction of the values of the missing entries
Denoising
The ML algorithm is given a corrupted example x’ obtained by an unknown corruption process from a clean example x, and predict the clean example from the corrupted version
Density estimation/probability mass function estimation
ML algorithm is asked to learn a probability density/mass function on the space that the examples were drawn from. Density estimation enables us to explicitly capture this probability distribution that most models learn implicitly to solve their tasks.