QnA Flashcards

Question 1

Q

common unsupervised task is association rule learning

Answer

A

in which the goal is to dig into large amounts of data and discover interesting relations between attributes. For example, suppose you own a supermarket. Running an association rule on your sales logs may reveal that people who purchase barbecue sauce and potato chips also tend to buy steak. Thus, you may want to place these items close to one another.

Question 2

Q

semi-supervised learning

Answer

A

Some algorithms can deal with data that’s partially labeled.

Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just add one label per person⁠3 and it is able to name everyone in every photo, which is useful for searching photos.

Question 3

Q

Self-supervised learning

Answer

A

Another approach to machine learning involves actually generating a fully labeled dataset from a fully unlabeled one. Again, once the whole dataset is labeled, any supervised learning algorithm can be used. This approach is called self-supervised learning.

Question 4

Q

Reinforcement learning

Answer

A

The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as shown in Figure1-13). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.

Question 5

Q

Batch learning

Answer

A

In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning.

Question 6

Q

model rot or data drift.

Answer

A

Unfortunately, a model’s performance tends to decay slowly over time, simply because the world continues to evolve while the model remains unchanged. This phenomenon is often called model rot or data drift. The solution is to regularly retrain the model on up-to-date data.

Question 7

Q

Online learning

Answer

A

In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or in small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives

Question 8

Q

Additionally, online learning algorithms can be used to train models on huge datasets that cannot fit in one machine’s main memory (this is called out-of-core learning). The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data

Question 9

Q

One important parameter of online learning systems is how fast they should adapt to changing data: this is called the learning rate. If you set a high learning rate, then your system will rapidly adapt to new data, but it will also tend to quickly forget the old data (and you don’t want a spam filter to flag only the latest kinds of spam it was shown). Conversely, if you set a low learning rate, the system will have more inertia; that is, it will learn more slowly, but it will also be less sensitive to noise in the new data or to sequences of nonrepresentative data points (outliers).

Question 10

Q

PIPELINES

Answer

A

A sequence of data processing components is called a data pipeline.

Question 11

Q

First, determine what kind of training supervision the model will need: is it a supervised, unsupervised, semi-supervised, self-supervised, or reinforcement learning task? And is it a classification task, a regression task, or something else? Should you use batch learning or online learning techniques?

Question 12

Q

since the model can be trained with labeled examples

Answer

A

a typical supervised learning task

Question 13

Q

since the model will be asked to predict a value

Answer

A

It is a typical regression task,

Question 14

Q

since the system will use multiple features to make a prediction

Answer

A

this is a multiple regression problem

Question 15

Q

since we are only trying to predict a single value for each district

Answer

A

It is also a univariate regression problem,

Question 16

Q

there is no continuous flow of data coming into the system

Answer

Study These Flashcards

A

plain batch learning should do just fine.

Question 17

Q

typical performance measure for regression problems

Answer

Study These Flashcards

A

RMSE root mean square error

Question 18

Q

Equation 2-1. Root mean square error (RMSE)

RMSE
(
X
,
h
)
=
1
m
∑
i
=
1
m
h
(
x
(
i
)
)
-
y
(
i
)
2

Answer

Study These Flashcards

A

Question 19

Q

if there are many outlier districts. Can rmse be used?

Answer

Study These Flashcards

A

No. It is more sensitive to outliers and hence MSE should be used instead. Mean Absolute Error

Question 20

Q

Both the RMSE and the MAE are ways to measure the distance between two vectors: the vector of predictions and the vector of target values.

Answer

Study These Flashcards

A

Question 21

Q

Various distance measures, or norms, are possible:

Computing the root of a sum of squares (RMSE) corresponds to the Euclidean norm: this is the notion of distance we are all familiar with. It is also called the ℓ2 norm, noted ∥ · ∥2 (or just ∥ · ∥).

Computing the sum of absolutes (MAE) corresponds to the ℓ1 norm, noted ∥ · ∥1. This is sometimes called the Manhattan norm because it measures the distance between two points in a city if you can only travel along orthogonal city blocks.

Answer

Study These Flashcards

A

QnA Flashcards

(21 cards)