The Machine Learning Landscape Flashcards
What is a definition for “Machine Learning”?
Machine Learning is the field of study that gives computers the ability to learn how to solve a specific problem, without being explicitly programmed.
What is the definition of “Training Set” and “Test Set”?
- The Training Set is the examples/instances that the system uses to learn from.
- The Test Set is the instances that you uses to test the system on what he learnt from the training set.
What is the definition of a “Model”?
The part of the machine learning system that learns and makes predictions.
Examples: Neural Networks, Random Forest.
In which cases Machine Learning is a great solution?
- Problems for which existing solutions require of lot of fine-tuning or complex rules;
- Complex problems for which using a traditional approach yields no good solution;
- Fluctuating environments (a ML system can easily be re-trained on new data);
- Getting insights about complex problems and large amounts of data.
What are the main criteria used to classify Machine Learning systems?
- Training Supervision (supervised, unsupervised, semi-supervised, self-supervised…);
- Incremental/Online or Batch Learning;
- Learning Approach (instance-based or model-based).
What is the definition of “Supervised Learning”?
The model is trained using instances with their features (called predictors or attributes) and the desired solutions (called labels).
Examples: classification, regression: predict a target numeric value.
What is the definition of “Unsupervised Learning”?
The model is trained without any solutions (unlabeled).
Examples: clustering, anomaly detection, association rule learning, dimensionality reduction.
What is the definition of “Clustering”?
The task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some way) to each other than to those in other groups (clusters).
What is the definition of “Dimensionality Reduction”?
The task of reducing the number of features in a dataset while retaining as much information as possible. It is a process of transforming high-dimensional data into a lower-dimensional space that still preserves the essence of the original data.
What is the definition of “Anomaly Detection”?
The task of identifying rare instances or observations which can raise suspicions by being statistically different from the rest of the observations.
Examples: credit card fraud, manufacturing defaults.
What is the definition of “Self-Supervised Learning”?
The model trains itself to learn one part of the input from another part of the input: it generates a fully labeled dataset from a fully unlabeled one.
What is the definition of “Reinforcement Learning”?
The learning system, called an agent, can observe the environment, select and perform actions, and get rewards in return (or penalties as negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time.
What is the definition of “Batch/Offline Learning”?
The system is incapable of learning incrementally: it must be trained using all the available data. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned.
What is the definition of “Association Rule Learning”?
The task of detecting dependency of one data item on another data item and maps accordingly so that it can be more profitable. It tries to find some interesting relations or associations among the variables of dataset.
What is the definition of “Semi-Supervised Learning”?
The model is train with partially labeled instances. Most of these models are combinations of unsupervised and supervised algorithms. This type of learning is interesting as labeling is usually time-consuming and costly.
What is the definition of “Model Rot” or “Data Drift”?
The phenomenon by which a offline model’s performance tends to decay slowly over time because the world continues to evolve while the model remains unchanged.
What is the definition of “Incremental/Online Learning”?
The system is trained incrementally by feeding it data instances sequentially, either individually or in small groups called mini-batches.
What is the “Learning Rate”? And what is the difference between a high and low one?
The speed at which the online learning model adapt to new and changing data:
- High: rapidly adapt to new data BUT tends to quickly forget the old data;
- Low: learn more slowly BUT less sensitive to noise and outliers.