QnA Flashcards

1
Q

common unsupervised task is association rule learning

A

in which the goal is to dig into large amounts of data and discover interesting relations between attributes. For example, suppose you own a supermarket. Running an association rule on your sales logs may reveal that people who purchase barbecue sauce and potato chips also tend to buy steak. Thus, you may want to place these items close to one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

semi-supervised learning

A

Some algorithms can deal with data that’s partially labeled.

Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just add one label per person⁠3 and it is able to name everyone in every photo, which is useful for searching photos.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Self-supervised learning

A

Another approach to machine learning involves actually generating a fully labeled dataset from a fully unlabeled one. Again, once the whole dataset is labeled, any supervised learning algorithm can be used. This approach is called self-supervised learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reinforcement learning

A

The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as shown in Figure1-13). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Batch learning

A

In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

model rot or data drift.

A

Unfortunately, a model’s performance tends to decay slowly over time, simply because the world continues to evolve while the model remains unchanged. This phenomenon is often called model rot or data drift. The solution is to regularly retrain the model on up-to-date data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Online learning

A

In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or in small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Additionally, online learning algorithms can be used to train models on huge datasets that cannot fit in one machine’s main memory (this is called out-of-core learning). The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

One important parameter of online learning systems is how fast they should adapt to changing data: this is called the learning rate. If you set a high learning rate, then your system will rapidly adapt to new data, but it will also tend to quickly forget the old data (and you don’t want a spam filter to flag only the latest kinds of spam it was shown). Conversely, if you set a low learning rate, the system will have more inertia; that is, it will learn more slowly, but it will also be less sensitive to noise in the new data or to sequences of nonrepresentative data points (outliers).

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

PIPELINES

A

A sequence of data processing components is called a data pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

First, determine what kind of training supervision the model will need: is it a supervised, unsupervised, semi-supervised, self-supervised, or reinforcement learning task? And is it a classification task, a regression task, or something else? Should you use batch learning or online learning techniques?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

since the model can be trained with labeled examples

A

a typical supervised learning task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

since the model will be asked to predict a value

A

It is a typical regression task,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

since the system will use multiple features to make a prediction

A

this is a multiple regression problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

since we are only trying to predict a single value for each district

A

It is also a univariate regression problem,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

there is no continuous flow of data coming into the system

A

plain batch learning should do just fine.

17
Q

typical performance measure for regression problems

A

RMSE root mean square error

18
Q

Equation 2-1. Root mean square error (RMSE)

RMSE
(
X
,
h
)
=
1
m

i
=
1
m
h
(
x
(
i
)
)
-
y
(
i
)
2

A
19
Q

if there are many outlier districts. Can rmse be used?

A

No. It is more sensitive to outliers and hence MSE should be used instead. Mean Absolute Error

20
Q

Both the RMSE and the MAE are ways to measure the distance between two vectors: the vector of predictions and the vector of target values.

A
21
Q

Various distance measures, or norms, are possible:

Computing the root of a sum of squares (RMSE) corresponds to the Euclidean norm: this is the notion of distance we are all familiar with. It is also called the ℓ2 norm, noted ∥ · ∥2 (or just ∥ · ∥).

Computing the sum of absolutes (MAE) corresponds to the ℓ1 norm, noted ∥ · ∥1. This is sometimes called the Manhattan norm because it measures the distance between two points in a city if you can only travel along orthogonal city blocks.

A