Educative Machine Learning System Design - ML Primer Flashcards

1
Q

What should you expect in a machine learning interview?

A

Most major companies expect Machine Learning engineers to have solid engineering foundations and hands-on Machine Learning experiences. The candidates go through a similar method of problem solving (Leetcode style), system design, knowledge of machine learning and machine learning system design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s the standard development cycle of machine learning?

A

It includes data collection, problem formulation, model creation, implementation of models, and enhancement of models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 6 basic steps to approach Machine Learning System Design?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do you do during the “problem statement” step to approach Machine Learning System Design?

A

Asking questions is crucial to filling in any gaps and agreeing on goals.

It’s important to state the correct problems. It is the candidates job to understand the intention of the design and why it is being optimized. It’s important to make the right assumptions and discuss them explicitly with interviewers.

If we are clear on the problem statement of designing a Feed Ranking system, we can then start talking about relevant metrics like user agreements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What do you do during the “identify metrics” step to approach Machine Learning System Design?

A

During the development phase, we need to quickly test model performance using offline metrics. You can start with the popular metrics like logloss and AUC for binary classification, or RMSE and MAPE for forecast.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do you do during the “identify requirements” step to approach Machine Learning System Design? Which two requirements do you need to identify?

A
  • Training requirements
  • Inference requirements

Training Requirements

There are many components required to train a model from end to end. These components include the data collection, feature engineering, feature selection, and loss function. For example, if we want to design a YouTube video recommendations model, it’s natural that the user doesn’t watch a lot of recommended videos. Because of this, we have a lot of negative examples. The question is asked:

How do we train models to handle an imbalance class?

Once we deploy models in production, we will have feedback in real time.

How do we monitor and make sure models don’t go stale?

Inference requirements

Once models are deployed, we want to run inference with low latency (<100ms) and scale our system to serve millions of users.

How do we design inference components to provide high availability and low latency?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do you do during the “identify requirements” step to approach Machine Learning System Design? Which two requirements do you need to identify?

A
  • Training requirements
  • Inference requirements

Training Requirements

There are many components required to train a model from end to end. These components include the data collection, feature engineering, feature selection, and loss function. For example, if we want to design a YouTube video recommendations model, it’s natural that the user doesn’t watch a lot of recommended videos. Because of this, we have a lot of negative examples. The question is asked:

How do we train models to handle an imbalance class?

Once we deploy models in production, we will have feedback in real time.

How do we monitor and make sure models don’t go stale?

Inference requirements

Once models are deployed, we want to run inference with low latency (<100ms) and scale our system to serve millions of users.

How do we design inference components to provide high availability and low latency?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do you do during the “train and evaluate model” step to approach Machine Learning System Design?

A

3 components:

  • feature engineering
  • feature selection
  • models

For example, in Rental Search Ranking, we will discuss if we should use ListingID as embedding features. In Estimate Food Delivery Time, we will discuss how to handle the latitude and longitude features efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do you do during the “train and evaluate model” step to approach Machine Learning System Design?

A

3 components:

  • feature engineering
  • feature selection
  • models

For example, in Rental Search Ranking, we will discuss if we should use ListingID as embedding features. In Estimate Food Delivery Time, we will discuss how to handle the latitude and longitude features efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do you do during the “design high level system” step to approach Machine Learning System Design?

A

Goal: identify a minimal, viable design to demonstrate a working system.

In this stage, we need to think about the system components and how data flows through each of them. We need to explain why we decided to have these components and what their roles are.

  • For example, when designing Video Recommendation systems, we would need two separate components: the Video Candidate Generation Service and the Ranking Model Service.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do you do during the “scale the design” step to approach Machine Learning System Design?

A

In this stage, it’s crucial to understand system bottlenecks and how to address these bottlenecks.

You can start by identifying:

  • Which components are likely to be overloaded?
  • How can we scale the overloaded components?
  • Is the system good enough to serve millions of users?
  • How we would handle some components becoming unavailable, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Where in the 6 step process to approach machine learning system design is the “problem statement” step located?

A

The “problem statement” step is the 1st step of the process.

It is to be done before the “identify metrics” step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Where in the 6-step process to approach machine learning system design is the “identify metrics” step located?

A

It is the 2nd step of the 6 step process of the approach to machine learning system design.

It is to be done after the “problem statement” step and before the “identify requirements”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Where in the 6-step process to approach machine learning system design is the “identify requirements” step located?

A

It is the 3rd step of the 6-step process to approach machine learning system design interview.

It is to be done after the “identify metrics” step and before the “train and evaluate models” step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Where in the 6-step process to approach machine learning system design is the “train and evaluate models” step located?

A

It is the 4th step of the process.

It is to be done after the “identify requirements” step and before the “design high level requirements” step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Where in the 6-step process to approach machine learning system design is the “design high level system” step located?

A

It is the 5th step in the process.

It is to be done after the “train and evaluate models” step and before the “scale the design” step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Where in the 6-step process to approach machine learning system design is the “scale the design” step located?

A

It is the last and final step of the process.

It is to be done after the “design high level system” step.

18
Q

What is one hot encoding?

A

It converts categorical variables into a one-hot numeric array.

One hot encoding is a very common technique in feature engineering. One hot encoding is very popular when you have to deal with categorical features that have medium cardinality.

19
Q

What are common problems with one-hot encoding?

A
  • Expansive computation and high memory consumption are major problems with one hot encoding. High numbers of values will create high-dimensional feature vectors. For example, if there are one million unique values in a column, it will produce feature vectors that have a dimensionality of one million.
  • One hot encoding is not suitable for Natural Language Processing tasks. Microsoft Word’s dictionary is usually large, and we can’t use one hot encoding to represent each word as the vector is too big to store in memory.
20
Q

What are some best practices for one-hot encoding?

A
  • Depending on the application, some levels/categories that are not important, can be grouped together in the “Other” class.
  • Make sure that the pipeline can handle unseen data in the test set.
21
Q

At which step of the machine learning system design do you discuss feature selection and feature engineering?

A

You discuss it during the “identify requirements” step as well as the “train and evaluate models” step.

22
Q

What is feature hashing?

A

Feature hashing, called the hashing trick, converts text data or categorical attributes with high cardinalities into a feature vector of arbitrary dimensionality.

23
Q

What are the benefits of feature hashing?

A

Feature hashing is very useful for features that have high cardinality with hundreds and thousands of unique values. Hashing trick is a way to reduce the increase in dimension and memory by allowing multiple values to be present/encoded as the same value.

24
Q

What are some problems associated with feature hashing?

A
  • One problem with hashing is collision. If the hash size is too small, more collisions will happen and negatively affect model performance. On the other hand, the larger the hash size, the more it will consume memory.
  • Collisions also affect model performance. With high collisions, a model won’t be able to differentiate coefficients between feature values. For example, the coefficient for “User login/ User logout” might end up the same, which makes no sense
25
Q

What is a crossed feature?

A

Crossed features, or conjunction, between two categorical variables of cardinality c1 and c2 is just another categorical variable of cardinality c1×c2. If c1 and c2 are large, the conjunction feature has high cardinality, and the use of the hashing trick is even more critical in this case. Crossed feature is usually used with a hashing trick to reduce high dimensions.

As an example, suppose we have Uber pick-up data with latitude and longitude stored in the database, and we want to predict demand at a certain location. If we just use the feature latitude for learning, the model might learn that a city block at a particular latitude is more likely to have a higher demand than others. This is similar for the feature longitude. However, a feature cross of longitude by latitude would represent a well-defined city block. Consequently, the model will learn more accurately

26
Q

What is embedding?

A

Feature embedding is an emerging technique that aims to transform features from the original space into a new space to support effective machine learning. The purpose of embedding is to capture semantic meaning of features; for example, similar features will be close to each other in the embedding vector space.

27
Q

How to generate/learn embedding vector?

A

For popular deep learning frameworks like TensorFlow, you need to define the dimension of embedding and network architecture. Once defined, the network can learn embedding automatically.

28
Q

For numeric features, what can be done?

A

Normalization and Standardization

For numeric features, normalization can be done to make the mean equal 0, and the values be in the range [-1, 1]. There are some cases where we want to normalize data between the range [0, 1].

If features distribution resembles a normal distribution, then we can apply a standardized transformation.

29
Q

What is are problems associated with normalization?

A

In practice, normalization can cause an issue as the values of min and max are usually outliers. One possible solution is “clipping”, where we choose a “reasonable” value for min and max.

30
Q

How do you handle imbalance class distribution?

A
  • Use class weights in loss function
  • Use naive resampling
  • Use synthetic resampling
31
Q

Training Pipeline

What kind of format should you store data in to enable high throughput at low cost?

A

One common solution is to store data in a column-oriented format like Parquet or ORC. These data formats enable high throughput for ML and analytics use cases. In other use cases, the tfrecord data format is widely used in the TensorFlow ecosystem.

32
Q

Training Pipeline

Why do we need data partitioning?

A

Parquet and ORC files usually get partitioned by time for efficiency as we can avoid scanning through the whole dataset.

In this example, we partition data by year then by month. In practice, most common services on AWS, RedShift, and Athena support Parquet and ORC. In comparison to other formats like csv, Parquet can speed up the query times to be 30x faster, save 99% of the cost, and reduce the data that is scanned by 99%.

Partition training data in Parquet format
33
Q

Training Pipeline: Choose the right loss function

What are common metrics for forecast problems?

A

Most common metrics are the Mean Absolute Percentage Error (MAPE) and the Symmetric Absolute Percentage Error (SMAPE).

34
Q

Training Pipeline: Choose the right loss function

What’s the most popular loss function for binary classification?

A

Cross-entropy

35
Q

Training Pipeline: Retraining requirements

What’s a common common design pattern for retraining?

A

A common design pattern is to use a scheduler to retrain models on a regular basis, usually many times per day.

36
Q

Training Pipeline: Retraining requirements

What do you need to balance between when designing a retraining system?

A

Machine learning engineers need to make the training pipeline run fast and scale well with big data. When you design such a system, you need to balance between model complexity and training time.

37
Q

Inference

What’s a common pattern when designing an inference system?

A

During inference, one common pattern is to split workloads onto multiple inference servers. We use similar architecture in Load Balancers. It is also sometimes called an Aggregator Service.

Dispatcher diagram
38
Q

Inference

How does the aggregator service for inferencing work?

A
  1. Clients (upstream process) send requests to the Aggregator Service. If the workload is too high, the Aggregator Service splits the workload and sends it to workers in the Worker pool. Aggregator Service can pick workers through one of the following ways:
    1. Work Load
    2. Round Robin
    3. Request Parameter
  2. Wait for response from workers.
  3. Forward response to client.
39
Q

What is the non-stationary problem in inferencing?

A

In an online setting, data is always changing. Therefore, the data distribution shift is common. So, keeping the models fresh is crucial to achieving sustained performance. Based on how frequently the model performance degrades, we can then decide how often models need to update/retrain. One common algorithm that can be used is the Bayesian Logistic Regression.

40
Q

What is Thompson Sampling?

A

In an Ad Click prediction use case, it’s beneficial to allow some exploration when recommending new ads. However, if there are too few ad conversions, it can reduce company revenue. This is a well-known exploration-exploitation trade-off. One common technique is Thompson Sampling where at a time, t, we need to decide which action to take based on the reward.

41
Q

What kind of metrics are used in offline environments?

A

During offline training and evaluating, we use metrics like logloss, MAE, and R2 to measure the goodness of fit. Once the model shows improvement, the next step would be to move to the staging/sandbox environment to test for a small percentage of real traffic.

42
Q

Online metrics

How do we evaluation models in online environments?

A

A/B Testing

This diagram shows one way to allocate traffic to different models in production. In reality, there will be few a dozen models, each getting real traffic to serve online requests. This is one way to verify whether or not a model actually generates lift in the production environment.

Allocate traffic for multiple models in production