- How much training data do we have access to? - What is the state of the data (already in the form of features, event data, log data)? - What's most important? accuracy or response time? - Hardware constraints? - Time constraints? - Model retraining (spam vs recommendation systems)

- summary of prediction results from a classification model - predicted values on y axis - actual values on x axis

ML System Design Flashcards by Katie D

Elements of ML System Design

Clarify functional requirements (business objective)
Performance requirements
Frame as an ML Problem (inputs and outputs)
What data do we have access to for training
Feature engineering
Choose a model
Prediction Pipeline
Training pipeline
Offline & Online Metrics

How well did you know this?

Not at all

Perfectly

Questions to ask

How much training data do we have access to?
What is the state of the data (already in the form of features, event data, log data)?
What’s most important? accuracy or response time?
Hardware constraints?
Time constraints?
Model retraining (spam vs recommendation systems)

How well did you know this?

Not at all

Perfectly

confusion matrix

summary of prediction results from a classification model
predicted values on y axis
actual values on x axis

How well did you know this?

Not at all

Perfectly

false positive rate

false positive / total negatives
FP / (FP + TN)

How well did you know this?

Not at all

Perfectly

Imbalanced dataset

A classification data set with skewed class proportions (far more positives than negatives or vice versa)

How well did you know this?

Not at all

Perfectly

Difference between precision and FPR

Precision measures the probability that a sample classified as positive is actually positive while the FPR measures the ratio of false positives to total negatives

Precision is a better metric for datasets with a large number of negative samples

How well did you know this?

Not at all

Perfectly

Training set

Examples used for learning to fit the parameters of the model

How well did you know this?

Not at all

Perfectly

Validation set

Set of examples used to tune model parameters. For example, the number of layers of a neural network or batch size

How well did you know this?

Not at all

Perfectly

Test set

Used to assess the performance of a fully trained model

How well did you know this?

Not at all

Perfectly

sensitivity

measures the model’s ability to predict true positives in each available category

How well did you know this?

Not at all

Perfectly

specificity

measures the model’s ability to predict true negatives for each available category

How well did you know this?

Not at all

Perfectly

System design: data

Identify target variables
implicit (putting an item in your shopping cart) vs explicit (buying an item)

How well did you know this?

Not at all

Perfectly

Example features

user-location
user-age
aggregate features like user-candidate total likes

How well did you know this?

Not at all

Perfectly

What to do about missing data and outliers

If the dataset is large enough, you can drop them
If you can’t afford to drop any data, you can impute feature values by replacing them with a default (typically the mean, median, or mode)

How well did you know this?

Not at all

Perfectly

sample bias

Happens when the collected data doesn’t accurately represent the environment the program is expected to run into.

e.g. training facial recognition in daytime lighting conditions only

How well did you know this?

Not at all

Perfectly

exclusion bias

Study These Flashcards

Happens as a result of excluding some feature(s) from our dataset usually under the umbrella of cleaning our data.

Use feature importance tools. Don’t guess.

Measurement bias

Study These Flashcards

Systematic value distortion happens when there’s an issue with the device used to observe or measure.

Prejudice bias

Study These Flashcards

Happens as a result of cultural influences or stereotypes.

Images of nurses or wedding dresses

Ranking model (recommendation systems)

Study These Flashcards

Estimates the probability a video will be watched

Feature Scaling Techniques

Study These Flashcards

Normalization (Min/Max Scaling)
Standardization (Z-score normalization)
Log scaling
Discretization (Bucketing)
Encoding categorical features (integer encoding, one-hot encoding, embedding learning)

Normalization

Study These Flashcards

Min/max scaling
Values are mapped from 0 to 1
Normalization does not change the distribution of a feature

Standardization (Z-Score Normalization)

Study These Flashcards

Changing the distribution of a feature to have 0 mean and a std dev of 1

Log Scaling

Study These Flashcards

Mitigates skew

Discretization

Study These Flashcards

Bucketing
The process of converting a continuous feature into a categorical feature
Age -> bucket (reduces a discrete feature to a number of categories)

Techniques for Encoding Categorical Features

- Integer encoding - One-hot encoding - Embedding learning

When is integer encoding a good choice?

When there is an ordinal relationship between the categories. Integer encoding works well for excellent = 5 stars, Good = 4 stars

One hot encoding

A new binary feature is created for each unique value If your feature has 3 colors, convert that to three booleans (isRed, isGreen, isBlue).

Embedding Learning

- Map a categorical feature into an N-dimensional vector - Useful when the number of unique values of a feature is very large - Learning an N-dimensional vector for each unique value the categorical feature may take. The resulting vectors are smaller in size than the one-hot encoded versions would be.

Harmonic mean

More heavily weighted to the smallest numbers which mitigates the impact of large outliers.

ML System Design Flashcards

(29 cards)