CHAPTER 2 End-to-End Machine Learning Project Flashcards

1
Q

What are the main steps we need to take for an End-to-End ML project? P 65

A
  1. Look at the big picture.
  2. Get the data.
  3. Discover and visualize the data to gain insights.
  4. Prepare the data for Machine Learning algorithms.
  5. Select a model and train it.
  6. Fine-tune your model.
  7. Present your solution.
  8. Launch, monitor, and maintain your system.
    In page 67 there’s a link to an appendix, explaining details of each step.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the first question to ask when beginning an ML project? P 67

A

The first question to ask your boss is what exactly the business objective is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are upstream and downstream systems? External

A

An upstream system is any system that sends data to the Collaboration Server system. A downstream system is a system that receives data from the Collaboration Server system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s a pipeline? P 68

A

A sequence of data processing components is called a data pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the difference between online learning and batch learning? External

A

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set or batches of it. (Mahsa: Basically online learning is using stochastic gradient descent for updating weights and batch learning is using batches for updating weights)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What should we do after defining the problem? P 69

A

Your next step is to select a performance measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is Root Mean Square Error (RMSE) calculated? P 70

A

It’s the root of Mean Square Error so its:
radical(1/m sigma (prediction-y)^2))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the most common performance metric for regression problems? What performance metric do we use when we have many outliers? P 71

A

Even though the RMSE is generally the preferred performance measure for regression tasks, in some contexts you may prefer to use another function. For example, suppose that there are many outlier districts. In that case, you may consider using the mean absolute error (MAE, also called the average absolute deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Computing the root of a sum of squares (RMSE) corresponds to the …norm: this is the notion of distance you are familiar with. It is also called the …, denoted as … P 71

A

Euclidean, ℓ2 norm, || ·||2 (or just || ·||)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Computing the sum of absolutes (MAE) corresponds to the…, denoted as …. This is sometimes called the …norm because it measures the distance between two points in a city if you can only travel along orthogonal city blocks. P 71

A

ℓ1 norm,||·||1, Manhattan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is MAE calculated? P 71

A

MAE= sigma(|prediction - y|)/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

More generally, the ℓk norm of a vector v containing n elements is defined as: ____ P 71

A

||v||^k = (|v0 | ^k + |v1 | ^k + … + |vn | ^k ) ^1/k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why the RMSE is more sensitive to outliers than the MAE? P 71

A

The higher the norm index (∥v∥^k = (|v0 | ^k + |v1 | ^k + … + |vn | ^k ) ^1/k ), the more it focuses on large values and neglects small ones. But when outliers are exponentially rare (like in a bell-shaped curve), the RMSE performs very well and is generally preferred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Besides using SKLEARN, what can we do to
have a stable train/test split even after updating the dataset? Code P 82

A

To have a stable train/test split even after updating the dataset, a common solution is to use each instance’s identifier to decide whether it should go in the test set (assuming instances have a unique and immutable identifier). For example, you could:
1- ✨ compute a hash of each instance’s identifier ✨
2-✨ put that instance in the test set if the hash is lower than or equal to 20% of the maximum hash value.✨
This ensures that the test set will remain consistent across multiple runs, even if you refresh the dataset.
The new test set will contain 20% of the new instances, but it will not contain any instance that was previously in the training set.
Here is a possible implementation:

from zlib import crc32
def test_set_check(identifier, test_ratio):
return crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When a company wants to sample 1000 people, do they do it purely randomly? Use an example to explain P 83

A

When a survey company decides to call 1,000 people to ask them a few questions, they don’t just pick 1,000 people randomly in a phone book. They try to ensure that these 1,000 people are representative of the whole population. For example, the US population is 51.3%
females and 48.7% males, so a well-conducted survey in the US would try to maintain this ratio in the sample: 513 female and 487 male. This is called stratified sampling.

1/ The population is divided into homogeneous subgroups called ✨strata✨,
2/ The right number of instances are sampled from each stratum to guarantee that the test set is representative of the overall population.

“Stratum” refers to a single subgroup or category, while “Strata” can mean several, or all, groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In stratified sampling, “Stratum” refers to a single subgroup or category, while “Strata” can mean several, or all, groups
You should have many strata, and each stratum should be large enough. True/False P 84

A

False. You should not have too many strata, and each stratum should be large enough.

16
Q

How can we turn continuous features to categorical using Pandas and sklearn? External

A

Pandas: pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates=’raise’, ordered=True)

Sklearn: sklearn.preprocessing.KBinsDiscretizer(n_bins=5, *, encode=’onehot’, strategy=’quantile’, dtype=None, subsample=’warn’, random_state=None)

17
Q

What is the parameter in a scatter plot that helps us see high density parts? P 86

A

Setting the alpha option to 0.1 makes it much easier to visualize the places where there is a high density of data points

18
Q

How can we set the radius and color of the circles in a scatter plot? P 88

A

✨ The radius of each circle represents the district’s population (option s),
✨ The color represents the price (option c).
✨ We will use a predefined color map (option cmap) called jet, which ranges from blue (low values) to red (high prices):

housing.plot(kind=”scatter”, x=”longitude”, y=”latitude”, alpha=0.4, s=housing[“population”]/100, label=”population”, figsize=(10,7), c=”median_house_value”, cmap=plt.get_cmap(“jet”), colorbar=True, )
plt.legend()

19
Q

Let’s say we have achieved a result for test set, using the tuned model. How can we be sure that it’s better than the previous model in use and the result is not based on chance? P 110

A

In some cases, such a point estimate of the generalization error will not be quite enough to convince you to launch: what if it is just 0.1% better than the model currently in production? You might want to have an idea of how precise this estimate is.
For this, you can compute a 95% confidence interval for the generalization error using ✨ scipy.stats.t.interval(): ✨
&raquo_space;> from scipy import stats
»> confidence = 0.95
&raquo_space;> squared_errors = (final_predictions - y_test) ** 2
&raquo_space;> np.sqrt(stats.t.interval(confidence, len(squared_errors) - 1, loc=squared_errors.mean(), scale=stats.sem(squared_errors)))
array([45685.10470776, 49691.25001878])

20
Q

What data do we need to compute the confidence interval? External

A

To compute a 95% confidence interval, you need three pieces of data:
✨ The mean (for continuous data) or proportion (for binary data)
✨ The standard deviation, which describes how dispersed the data is around the average.
✨ The sample size.