Basic Concepts in Machine Learning Flashcards
Learning
Learning is the acquisition of new information or knowledge or the process to acquire knowledge or skill by systematic study or by trial and error
What is Machine Learning?
Machine learning is โthe field of study that gives computers the ability to learn without being explicitly programmedโ
A machine learning system is comprised of the following four components:
- Dataset ๐ฎ: a set of samples generated by some system or process; the
samples can be single data points or pairs of input and output values - Model โณ: an adjustable and compact representation of a certain class of
input/output relationships that is hypothesized to be capable of modeling the
system or process which generates ๐ฎ - Objective Function โ: a function that encodes the current performance of โณ
(e.g. loss or reward) - Algorithm ๐: the learning algorithm that adjusts โณ based on ๐ฎ and โ
Machine learning is an important prerequisite for the implementation of a broad range of cognitive functions in artificial cognitive systems:
- Learning and Development: modeling and implementation of biological
learning mechanisms (operant conditioning, implicit learning, explicit
learning, perception etc.) - Memory, Knowledge, and Internal Simulation: modeling and implementation
of the encoding, storage and retrieval of facts, experiences, and actions
(e.g. associative memory) - Perception: learning basic features to detect and categorize perceptual stimuli
(e.g. unsupervised learning of visual features) - Autonomy: dynamic adaption to changes in the environment (e.g. continuous
online learning from a live data stream
Practical Applications of Machine Learning Examples
- Image classification
- Speech recognition
- Autonomous driving
- Recommendation systems
- Threat protection
- Control systems
Definition of the Machine Learning Task
Train a model โณ in a hypothesis space โ using a learning algorithm ๐ so that
โณ minimizes loss โ
Types of Machine Learning
Unsupervised Learning
* Solely unlabeled data
* Discovery of structural features in the data set
Reinforcement Learning
* Interaction with the
environment
* Reward signal encodes feedback for the policy
Semi-Supervised Learning
* Labeled and unlabeled training samples
* A priori assumptions on input data required
Supervised Learning
* All training samples are labeled
* Desired output is
specified exactly
Combining Hypotheses to Ensembles
Ensemble methods in machine learning are a simple way to extend hypothesis
spaces by combining a set of hypotheses โ1, โ2, โฆ, โ๐ โ โ to a new hypothesis
โโ โ โ๐.
Boosting
Boosting algorithms compute a strong learner by incrementally constructing an
ensemble of hypotheses:
* Every training sample ๐ ๐ โ ๐ฎ is assigned a weight ๐ค๐; initially, all weights are
set to the same value
* Weights of incorrectly learned samples are increased
* The training of new hypotheses focusses on samples with high weights
Underfitting vs. Overfitting
- Underfitting: โ fits the training data poorly and does not model the underlying process because โ is not expressive enough
- Overfitting: โ fits the training data very well but does not model the underlying process because it does not generalize well
Generalization
predictive performance of โ on data that were not considered
during the training phase
Occamโs Razor
Of two competing theories, the simpler explanation of an
entity is to be preferred
Generative and Discriminative Models
- Discriminative models are based on the posterior probabilities P(๐ฆ|๐ฅ)
- Generative models are based on the prior probabilities P(๐ฅ| ๐ฆ) ; predictions can be computed by applying Bayesโ theorem: P๐ฆ๐ฅ =P(๐ฅ|๐ฆ)P(๐ฆ)/P(๐ฅ).
Generative models are compact representations of the training data that have considerably less parameters than the dataset ๐ฎ
Training Validation and Test
- Training set: the samples used in the training phase by the learning algorithm to search for a hypothesis โ in the hypothesis space โ
- Validation set: a set of samples that are used to assess the performance of a
hypothesis โ that was computed in the training phase; based on the performance of โ, the parameters of the training phase can be adjusted - Test set: a set of samples (or real-world data) that is used to assess the performance of the final model
Cross-Validation
- The dataset is partitioned into k subsets and learned in k iterations
- In every iteration, a different subset is selected as validation set
- The overall performance corresponds to the averaged performances of the k
iterations