The Machine Learning Landscape Flashcards
What is a definition for “Machine Learning”?
Machine Learning is the field of study that gives computers the ability to learn how to solve a specific problem, without being explicitly programmed.
What is the definition of “Training Set” and “Test Set”?
- The Training Set is the examples/instances that the system uses to learn from.
- The Test Set is the instances that you uses to test the system on what he learnt from the training set.
What is the definition of a “Model”?
The part of the machine learning system that learns and makes predictions.
Examples: Neural Networks, Random Forest.
In which cases Machine Learning is a great solution?
- Problems for which existing solutions require of lot of fine-tuning or complex rules;
- Complex problems for which using a traditional approach yields no good solution;
- Fluctuating environments (a ML system can easily be re-trained on new data);
- Getting insights about complex problems and large amounts of data.
What are the main criteria used to classify Machine Learning systems?
- Training Supervision (supervised, unsupervised, semi-supervised, self-supervised…);
- Incremental/Online or Batch Learning;
- Learning Approach (instance-based or model-based).
What is the definition of “Supervised Learning”?
The model is trained using instances with their features (called predictors or attributes) and the desired solutions (called labels).
Examples: classification, regression: predict a target numeric value.
What is the definition of “Unsupervised Learning”?
The model is trained without any solutions (unlabeled).
Examples: clustering, anomaly detection, association rule learning, dimensionality reduction.
What is the definition of “Clustering”?
The task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some way) to each other than to those in other groups (clusters).
What is the definition of “Dimensionality Reduction”?
The task of reducing the number of features in a dataset while retaining as much information as possible. It is a process of transforming high-dimensional data into a lower-dimensional space that still preserves the essence of the original data.
What is the definition of “Anomaly Detection”?
The task of identifying rare instances or observations which can raise suspicions by being statistically different from the rest of the observations.
Examples: credit card fraud, manufacturing defaults.
What is the definition of “Self-Supervised Learning”?
The model trains itself to learn one part of the input from another part of the input: it generates a fully labeled dataset from a fully unlabeled one.
What is the definition of “Reinforcement Learning”?
The learning system, called an agent, can observe the environment, select and perform actions, and get rewards in return (or penalties as negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time.
What is the definition of “Batch/Offline Learning”?
The system is incapable of learning incrementally: it must be trained using all the available data. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned.
What is the definition of “Association Rule Learning”?
The task of detecting dependency of one data item on another data item and maps accordingly so that it can be more profitable. It tries to find some interesting relations or associations among the variables of dataset.
What is the definition of “Semi-Supervised Learning”?
The model is train with partially labeled instances. Most of these models are combinations of unsupervised and supervised algorithms. This type of learning is interesting as labeling is usually time-consuming and costly.
What is the definition of “Model Rot” or “Data Drift”?
The phenomenon by which a offline model’s performance tends to decay slowly over time because the world continues to evolve while the model remains unchanged.
What is the definition of “Incremental/Online Learning”?
The system is trained incrementally by feeding it data instances sequentially, either individually or in small groups called mini-batches.
What is the “Learning Rate”? And what is the difference between a high and low one?
The speed at which the online learning model adapt to new and changing data:
- High: rapidly adapt to new data BUT tends to quickly forget the old data;
- Low: learn more slowly BUT less sensitive to noise and outliers.
What is the definition of “Instance-based Learning”?
One of the 2 main approaches to generalization (how to react in front of new instances).
The system learns the examples by heart, then generalizes to new cases by using a similarity measure to compare them to the learned examples (or a subset of them).
What is the definition of “Model-based Learning”?
One of the 2 main approaches to generalization (how to react in front of new instances).
The system builds a model (like a linear regression) through the examples and then use that model to make predictions.
What is the “Performance Measure”?
The metric used to determine how good or bad the model is performing.
What is the difference between a “Utility Function” and a “Cost Function”?
A “Utility Function” determines how GOOD the model is while a “Cost Function” determines how BAD it is.
What type of algorithm would you use to allow a robot to walk in various
unknown terrains?
The best Machine Learning algorithm to allow a robot to walk in unknown terrain is Reinforced Learning, where the robot can learn from response of the terrain to optimize itself.
What type of algorithm would you use to segment your customers into multiple
groups?
The best algorithm to segment customers into multiple groups is either supervised learning (if the groups have known labels) or unsupervised learning (if there are no group labels).