ch 1 hands-on machine learning Flashcards

1
Q

How would you define machine learning?

A

Machine learning is about building systems that can learn from data. Learning means getting better at some task, given some performance measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can you name four types of problems where ML shines?

A

Machine learning is great for complex problems for which we have no algorithmic solution, to replace long lists of hand-tuned rules, to build systems that adapt to fluctuating environments, and finally to help humans learn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a labeled training set?

A

A labeled training set is a training set that contains the desired solution (a.k.a a label) for each instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two most common supervised tasks?

A

The two most common supervised tasks are regression and classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can you name four common unsupervised tasks?

A

Common unsupervised tasks include clustering, visualization, dimensionality reduction, and association rule learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of Machine Learning algorithm would you use to allow a robot to walk in various unknown terrains?

A

Reinforcement learning is likely to perform best if we want a robot to learn to walk in various unknown terrains.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What type of algorithm would you use to segment your customers into multiple groups?

A

If you don’t know how to define the groups, then you can use a clustering algorithm (unsupervised learning) to segment your customers into clusters of similar customers. However, if you know what groups you would like to have, then you can feed many examples of each group to a classification algorithm (supervised learning), and it will classify all your customers into these groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Would you frame the problem of spam detection as a supervised learning problem or an unsupervised learning problem?

A

Spam detection is a typical supervised learning problem: the algorithm is fed many emails along with their labels (spam or not spam)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an online learning system?

A

An online learning system can learn incrementally, as opposed to a batch learning system. This makes it capable of adapting rapidly to both changing data and autonomous systems, and of training on very large quantities of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is out-of-core learning?

A

Out-of-core algorithms can handle vast quantities of data that cannot fit in a computer’s main memory. An out-of-core learning algorithm chops the data into mini-batches and uses online learning techniques to learn from these mini-batches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between a model parameter and a learning algorithm’s hyperparameter?

A

A model has one or more model parameters that determine what it will predict given a new instance (eg slope of a linear model). A learning algorithm tries to find optimal values for these parameters such that the model generalizes well to new instances. A hyperparameter is a parameter of the learning algorithm itself, not of the model (eg the amount of regularization to apply).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

a) What do model-based learning algorithms search for?
b) What is the most common strategy they use to
succeed?
c) how do they make predictions?

A

a) Model-based learning algorithms search for an optimal value for the model parameters such that the model will generalize well to new instances.
b) We usually train such systems by minimizing a cost function that measures how bad the system is at making predictions on the training data, plus a penalty for model complexity if the model is regularized.
c) To make predictions, we feed the new instance’s features into the model’s prediction function, using the parameter values found by the learning algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Can you name the main six challenges in Machine Learning?

A

Some of the main challenges in machine learning are the lack of data, poor data quality, non representative data, uninformative features, excessively simple models that underfit the training data, and excessively complex models that overfit the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

IF your model performs great on the training data but generalizes poorly to new instances, what is happening? Can you name three possible solutions?

A

IF a model performs great on the training data but generalizes poorly to new instances, the model is likely overfitting the training data (or we got extremely lucky on the training data). Possible solutions to overfitting are getting more data, simplifying the model or reducing the noise in the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a test set, and why would you want to use it?

A

A test set is used to estimate the generalization error that a model will make on new instances, before the model is launched in production.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the purpose of a validation set?

A

A validation set is used to compare models. It makes it possible to select the best model and tune the hyperparameters.

17
Q

What can go wrong if you tune hyperparameters using the test set?

A

If you tune hyperparameters using the test set, you risk overfitting the test set, and the generalization error you measure will be optimistic.