Machine Learning Systems Flashcards by Douglas Sherk

What are the three main components of a ML system that we need to decide on?

Training algorithm
Training data
Features

How well did you know this?

Not at all

Perfectly

What two methods are there to collect training data?

Online: Users’ preexisting interactions with the system, e.g., impressions for a search engine
Offline: Human annotators, which are expensive, must be QA’d, etc.

How well did you know this?

Not at all

Perfectly

What kind of human annotation is there?

MTurk/cheap
Specialized/hired annotators
Public datasets

How well did you know this?

Not at all

Perfectly

What are the two most common ML system SLA metrics?

Performance

2. Capacity

How well did you know this?

Not at all

Perfectly

What is the ML system performance metric?

Time to get back results, usually measured in ms for 99% of queries

How well did you know this?

Not at all

Perfectly

What is the ML system capacity metric?

Load the system can handle, usually measured in queries per second (QPS)

How well did you know this?

Not at all

Perfectly

What are the three types of ML system complexity? (Think what decisions make it slow.)

Training: time to train the model
Evaluation/Inference: time to evaluate input at testing/deployment time
Sampling: total number of samples required to learn the target

How well did you know this?

Not at all

Perfectly

What bad thing happens when you switch from a popularity-based recommendation model to a user-specific one?

Bias in recommendations leads to user-specific recommendations being shown lower in the list, leading to a self-reinforcing cycle of only ever showing popularity-based recommendations.

How well did you know this?

Not at all

Perfectly

What is a SERP?

Search engine results page

How well did you know this?

Not at all

Perfectly

What are the names for the two types of pages in an A/B test?

Control

2. Variation

How well did you know this?

Not at all

Perfectly

What are the five (+optional sixth) steps in designing an A/B test?

Come up with a hypothesis
Design an experiment to measure the hypothesis using a NHST
Use a power test to determine the number of samples you need to form a conclusion
Run the experiment
Check for a statistically significant result via NHST
(optional) Consider backtesting

How well did you know this?

Not at all

Perfectly

What is backtesting in the context of an A/B test?

After the A/B test is complete and the null hypothesis rejected, swapping the control and variations and confirming that a negative effect exists.

How well did you know this?

Not at all

Perfectly

What is an embedding?

A low-dimensional version of a high-dimensional feature (speech, image, text) such that it captures the important semantic information.

How well did you know this?

Not at all

Perfectly

How are embeddings used?

They allow us to train machine learning models on dense representations of data that captures the dimensions we care about.

How well did you know this?

Not at all

Perfectly

What are two ways of generating text embeddings?

word2vec

2. Context-based embeddings

How well did you know this?

Not at all

Perfectly

What are two models for word2vec text embeddings?

Study These Flashcards

Continuous bag of words (CBOW)

2. Skipgrams

How does continuous bag of words (CBOW) work?

Study These Flashcards

It’s an LSTM model where, given a sequence of words of time $t_(-2)$, $t_(-1)$, $t_1$, $t_2$, all one-hot encoded, it predicts $t$.

How do skipgrams work?

Study These Flashcards

It’s an LSTM model where, given a one-hot encoded word at $t$, it predicts $t_(-2)$, $t_(-1)$, $t_1$, and $t_2$.

Why do we need context-based text embeddings?

Study These Flashcards

Word2vec strips out context-based information. For example, the word “apple” can refer to either Apple the company or an apple the fruit.

What are two models for context-based text embeddings?

Study These Flashcards

Embeddings from Language Models (ELMo)

2. Bidirectional Encoders Representations from Transformers (BERT)

How does ELMo work?

Study These Flashcards

Uses a bi-directional LSTM to capture words that appear before and after the current word.

How does BERT work?

Study These Flashcards

Uses an attention mechanism to see words in context, and only uses the ones that help with predictions.

What are two models for generating visual embeddings?

Study These Flashcards

Auto-encoders

2. Visual supervised learning tasks

How do auto-encoders generate visual embeddings?

Study These Flashcards

They generate low-dimensional representations of images with another path that re-creates the higher dimensional representation from that lower one. The model is optimized for recreating the original image as closely as possible. The path after the low-dimensional representation is then cut out and used as a feature vector.

What kind of supervision do most text-based embedding models use?

Self-supervised learning

How do visual supervised learning models generate visual embeddings?

The layer immediately before the softmax output layer is very low dimensionality and can be employed as a dense feature vector.

How do you generate embeddings for pairs of entities, like users and videos on YouTube?

Build a two-tower neural net that generates embeddings for both entities in the same embedding space, and then produces random pairwise comparisons where it optimizes loss for similar entities to show up together and disjoint entities to show up apart.

How do you do transfer learning with text data?

You can use an embedding generator as a feature extractor for text which you then go on to pass into another model like a spam classifier.

What are the five steps in the initial building of a ML model?

1. Identify the business problem and what you're trying to accomplish 2. Decide on one or more algorithms and techniques you'll use to solve it 3. Train the model on the given data and features 4. Evaluate performance using identified metrics and iterate using hyperparameter searches 5. Deploy the model as soon as it performs equal to or better than the current model

What are three issues that can lead to initial online ML model deployments not performing as well as offline?

1. Online/offline feature distribution drift 2. Online/offline feature extraction disparity 2. Under/overfitting

Machine Learning Systems Flashcards

(30 cards)