machine learning landscape Flashcards
Supervised/Unsupervised Learning
Question 1
Définir ce qu’est l’apprentissage supervisée
In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels.
Question 02
Supervised-learning
Quels sont les deux tâches principales de l’apprentissage supervisée ?
Classification(spam ou ham)
Régression(prédicteur)
Question 03
Définir ce qu’est l’apprentissage non supervisé ?
In unsupervised learning, as you might guess, the training data is unlabeled.
The system tries to learn without a teacher.
Question 04
Donner un exemple d’algorithme non supervisée.
For example, say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors.
At no point do you tell the algorithm which group a visitor belongs to: it finds those connections without your help.
For example, it might notice that 40% of your visitors are males who love comic books and generally read your blog in the evening, while 20% are young sci-fi lovers who visit during the weekends, and so on.
Question 05
Apprentissage non supervisée
Quel est la deuxième application d’un apprentissage non supervisée ?
Visualization algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D or 3D representation of your data that can easily be plotted
Question 06
Apprentissage non supervisée
Quelle est la quatrième application d’un apprentissage non supervisé ?
Dimensionality reduction, in which the goal is to simplify the data without losing too much information.
One way to do this is to merge several correlated features into one.
For example, a car’s mileage may be very correlated with its age, so the dimensionality reduction algorithm will merge them into one feature that represents the car’s wear and tear. This is called feature extraction.
Question 07
Apprentissage non supervisé
Quelle est la première application d’un apprentissage non supervisé ?
For example, say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors.
If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group.
Question 08
Définir ce qu’est l’apprentissage semi-supervisé ?
Some algorithms can deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data.
Question 09
Apprentissage semi supervisé
Donner une application d’apprentissage semi supervisé ?
Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just one label per person,4 and it is able to name everyone in every photo, which is useful for searching photos.
Question 10
What is a batch learning system?
In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data.
This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning.
Question 11
What is an online learning system?
In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives
Question 12
What is the out-of-core learning ?
Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one machine’s main memory (this is called out-of-core learning). The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data
Question 13
What type of learning algorithm relies on a similarity measure to make predictions ?
Instance-based learning system
The system learns the examples by heart, then generalizes to new cases using a similarity measure(exemple:count the number of words they have in common(spam))
Question 14
What do model-based learning algorithms search for?
Generalize from a set of examples then build a model of these examples, then use that model to make predictions.
Question 15
What is the most common strategy of the model base machine learning use to succeed? How do they make predictions?
You studied the data
You selected a model.
You trained it on the training data (i.e., the learning algorithm searched for the model parameter values that minimize a cost function).
Finally, you applied the model to make predictions on new cases (this is called inference), hoping that this model will generalize well.