machine learning landscape Flashcards

Question 1

Q

Supervised/Unsupervised Learning

Question 1

Définir ce qu’est l’apprentissage supervisée

Answer

A

In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels.

Question 2

Q

Question 02

Supervised-learning

Quels sont les deux tâches principales de l’apprentissage supervisée ?

Answer

A

Classification(spam ou ham)

Régression(prédicteur)

Question 3

Q

Question 03

Définir ce qu’est l’apprentissage non supervisé ?

Answer

A

In unsupervised learning, as you might guess, the training data is unlabeled.

The system tries to learn without a teacher.

Question 4

Q

Question 04

Donner un exemple d’algorithme non supervisée.

Answer

A

For example, say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors.

At no point do you tell the algorithm which group a visitor belongs to: it finds those connections without your help.

For example, it might notice that 40% of your visitors are males who love comic books and generally read your blog in the evening, while 20% are young sci-fi lovers who visit during the weekends, and so on.

Question 5

Q

Question 05

Apprentissage non supervisée

Quel est la deuxième application d’un apprentissage non supervisée ?

Answer

A

Visualization algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D or 3D representation of your data that can easily be plotted

Question 6

Q

Question 06

Apprentissage non supervisée

Quelle est la quatrième application d’un apprentissage non supervisé ?

Answer

A

Dimensionality reduction, in which the goal is to simplify the data without losing too much information.

One way to do this is to merge several correlated features into one.

For example, a car’s mileage may be very correlated with its age, so the dimensionality reduction algorithm will merge them into one feature that represents the car’s wear and tear. This is called feature extraction.

Question 7

Q

Question 07

Apprentissage non supervisé

Quelle est la première application d’un apprentissage non supervisé ?

Answer

A

For example, say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors.

If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group.

Question 8

Q

Question 08

Définir ce qu’est l’apprentissage semi-supervisé ?

Answer

A

Some algorithms can deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data.

Question 9

Q

Question 09

Apprentissage semi supervisé

Donner une application d’apprentissage semi supervisé ?

Answer

A

Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just one label per person,4 and it is able to name everyone in every photo, which is useful for searching photos.

Question 10

Q

Question 10

What is a batch learning system?

Answer

A

In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data.

This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning.

Question 11

Q

Question 11

What is an online learning system?

Answer

A

In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives

Question 12

Q

Question 12

What is the out-of-core learning ?

Answer

A

Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one machine’s main memory (this is called out-of-core learning). The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data

Question 13

Q

Question 13

What type of learning algorithm relies on a similarity measure to make predictions ?

Answer

A

Instance-based learning system

The system learns the examples by heart, then generalizes to new cases using a similarity measure(exemple:count the number of words they have in common(spam))

Question 14

Q

Question 14

What do model-based learning algorithms search for?

Answer

A

Generalize from a set of examples then build a model of these examples, then use that model to make predictions.

Question 15

Q

Question 15

What is the most common strategy of the model base machine learning use to succeed? How do they make predictions?

Answer

A

You studied the data

You selected a model.

You trained it on the training data (i.e., the learning algorithm searched for the model parameter values that minimize a cost function).

Finally, you applied the model to make predictions on new cases (this is called inference), hoping that this model will generalize well.

Question 16

Q

Question 16

Can you name four of the main challenges in Machine Learning ?

Answer

A

Insufficient Quantity of Training Data

Nonrepresentative Training Data

Poor-Quality Data

Irrelevant Features

Question 17

Q

Question 17

Définir ce qu’est l’overfitting ?

Answer

A

It means that the model performs well on the training data, but it does not generalize well

Question 18

Q

Question 18

Quelles sont les solutions à l’overfitting ?

Answer

A

To simplify the model by selecting one with fewer parameters (e.g., a linear model rather than a high-degree polynomial model), by reducing the number of attributes in the training data or by constraining the model
To gather more training data
To reduce the noise in the training data (e.g., fix data errors and remove outliers)

Question 19

Q

Question 19

Constraining a model to make it simpler and reduce the risk of overfitting is called …

Answer

A

Constraining a model to make it simpler and reduce the risk of overfitting is called regularisation

Question 20

Q

Question 20

The amount of regularization to apply during learning can be controlled by a …. A hyperparameter is a parameter of a learning algorithm (not of the model).

Answer

A

The amount of regularization to apply during learning can be controlled by a hyperparameter. A hyperparameter is a parameter of a learning algorithm (not of the model).

Question 21

Q

Question 21

What is a test set and why would you want to use it?

Answer

A

A test set is used to estimate the generalization error that a model will make on new instances, before the model is launched in production

Question 22

Q

Question 22

What is the purpose of a validation set?

Answer

A

A validation set is used to compare models. It makes it possible to select the best model and tune the hyperparameters.

Question 23

Q

Question 23

What can go wrong if you tune hyperparameters using the test set?

Answer

A

If you tune hyperparameters using the test set, you risk overfitting the test set, and the generalization error you measure will be optimistic (you may launch a model that performs worse than you expect).

Question 24

Q

Question 24

What is cross-validation and why would you prefer it to a validation set?

Answer

A

Cross-validation is a technique that makes it possible to compare models (for model selection and hyperparameter tuning) without the need for a separate validation set. This saves precious training data.

Question 25

Q