ML System Design Flashcards

1
Q

Elements of ML System Design

A

Clarify functional requirements (business objective)
Performance requirements
Frame as an ML Problem (inputs and outputs)
What data do we have access to for training
Feature engineering
Choose a model
Prediction Pipeline
Training pipeline
Offline & Online Metrics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Questions to ask

A
  • How much training data do we have access to?
  • What is the state of the data (already in the form of features, event data, log data)?
  • What’s most important? accuracy or response time?
  • Hardware constraints?
  • Time constraints?
  • Model retraining (spam vs recommendation systems)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

confusion matrix

A
  • summary of prediction results from a classification model
  • predicted values on y axis
  • actual values on x axis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

false positive rate

A

false positive / total negatives
FP / (FP + TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Imbalanced dataset

A

A classification data set with skewed class proportions (far more positives than negatives or vice versa)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Difference between precision and FPR

A

Precision measures the probability that a sample classified as positive is actually positive while the FPR measures the ratio of false positives to total negatives

Precision is a better metric for datasets with a large number of negative samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Training set

A

Examples used for learning to fit the parameters of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Validation set

A

Set of examples used to tune model parameters. For example, the number of layers of a neural network or batch size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Test set

A

Used to assess the performance of a fully trained model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

sensitivity

A

measures the model’s ability to predict true positives in each available category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

specificity

A

measures the model’s ability to predict true negatives for each available category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

System design: data

A

Identify target variables
implicit (putting an item in your shopping cart) vs explicit (buying an item)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Example features

A

user-location
user-age
aggregate features like user-candidate total likes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What to do about missing data and outliers

A

If the dataset is large enough, you can drop them
If you can’t afford to drop any data, you can impute feature values by replacing them with a default (typically the mean, median, or mode)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

sample bias

A

Happens when the collected data doesn’t accurately represent the environment the program is expected to run into.

e.g. training facial recognition in daytime lighting conditions only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

exclusion bias

A

Happens as a result of excluding some feature(s) from our dataset usually under the umbrella of cleaning our data.

Use feature importance tools. Don’t guess.

17
Q

Measurement bias

A

Systematic value distortion happens when there’s an issue with the device used to observe or measure.

18
Q

Prejudice bias

A

Happens as a result of cultural influences or stereotypes.

Images of nurses or wedding dresses

19
Q

Ranking model (recommendation systems)

A

Estimates the probability a video will be watched

20
Q

Feature Scaling Techniques

A
  • Normalization (Min/Max Scaling)
  • Standardization (Z-score normalization)
  • Log scaling
  • Discretization (Bucketing)
  • Encoding categorical features (integer encoding, one-hot encoding, embedding learning)
21
Q

Normalization

A
  • Min/max scaling
  • Values are mapped from 0 to 1
  • Normalization does not change the distribution of a feature
22
Q

Standardization (Z-Score Normalization)

A
  • Changing the distribution of a feature to have 0 mean and a std dev of 1
23
Q

Log Scaling

A
  • Mitigates skew
24
Q

Discretization

A
  • Bucketing
  • The process of converting a continuous feature into a categorical feature
  • Age -> bucket (reduces a discrete feature to a number of categories)
25
Q

Techniques for Encoding Categorical Features

A
  • Integer encoding
  • One-hot encoding
  • Embedding learning
26
Q

When is integer encoding a good choice?

A

When there is an ordinal relationship between the categories. Integer encoding works well for excellent = 5 stars, Good = 4 stars

27
Q

One hot encoding

A

A new binary feature is created for each unique value
If your feature has 3 colors, convert that to three booleans (isRed, isGreen, isBlue).

28
Q

Embedding Learning

A
  • Map a categorical feature into an N-dimensional vector
  • Useful when the number of unique values of a feature is very large
  • Learning an N-dimensional vector for each unique value the categorical feature may take. The resulting vectors are smaller in size than the one-hot encoded versions would be.
29
Q

Harmonic mean

A

More heavily weighted to the smallest numbers which mitigates the impact of large outliers.