Machine Learning Systems Flashcards

1
Q

What are the three main components of a ML system that we need to decide on?

A
  1. Training algorithm
  2. Training data
  3. Features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What two methods are there to collect training data?

A
  1. Online: Users’ preexisting interactions with the system, e.g., impressions for a search engine
  2. Offline: Human annotators, which are expensive, must be QA’d, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What kind of human annotation is there?

A
  1. MTurk/cheap
  2. Specialized/hired annotators
  3. Public datasets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two most common ML system SLA metrics?

A
  1. Performance

2. Capacity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the ML system performance metric?

A

Time to get back results, usually measured in ms for 99% of queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the ML system capacity metric?

A

Load the system can handle, usually measured in queries per second (QPS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the three types of ML system complexity? (Think what decisions make it slow.)

A
  1. Training: time to train the model
  2. Evaluation/Inference: time to evaluate input at testing/deployment time
  3. Sampling: total number of samples required to learn the target
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What bad thing happens when you switch from a popularity-based recommendation model to a user-specific one?

A

Bias in recommendations leads to user-specific recommendations being shown lower in the list, leading to a self-reinforcing cycle of only ever showing popularity-based recommendations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a SERP?

A

Search engine results page

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the names for the two types of pages in an A/B test?

A
  1. Control

2. Variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the five (+optional sixth) steps in designing an A/B test?

A
  1. Come up with a hypothesis
  2. Design an experiment to measure the hypothesis using a NHST
  3. Use a power test to determine the number of samples you need to form a conclusion
  4. Run the experiment
  5. Check for a statistically significant result via NHST
  6. (optional) Consider backtesting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is backtesting in the context of an A/B test?

A

After the A/B test is complete and the null hypothesis rejected, swapping the control and variations and confirming that a negative effect exists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an embedding?

A

A low-dimensional version of a high-dimensional feature (speech, image, text) such that it captures the important semantic information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are embeddings used?

A

They allow us to train machine learning models on dense representations of data that captures the dimensions we care about.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are two ways of generating text embeddings?

A
  1. word2vec

2. Context-based embeddings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are two models for word2vec text embeddings?

A
  1. Continuous bag of words (CBOW)

2. Skipgrams

17
Q

How does continuous bag of words (CBOW) work?

A

It’s an LSTM model where, given a sequence of words of time $t_(-2)$, $t_(-1)$, $t_1$, $t_2$, all one-hot encoded, it predicts $t$.

18
Q

How do skipgrams work?

A

It’s an LSTM model where, given a one-hot encoded word at $t$, it predicts $t_(-2)$, $t_(-1)$, $t_1$, and $t_2$.

19
Q

Why do we need context-based text embeddings?

A

Word2vec strips out context-based information. For example, the word “apple” can refer to either Apple the company or an apple the fruit.

20
Q

What are two models for context-based text embeddings?

A
  1. Embeddings from Language Models (ELMo)

2. Bidirectional Encoders Representations from Transformers (BERT)

21
Q

How does ELMo work?

A

Uses a bi-directional LSTM to capture words that appear before and after the current word.

22
Q

How does BERT work?

A

Uses an attention mechanism to see words in context, and only uses the ones that help with predictions.

23
Q

What are two models for generating visual embeddings?

A
  1. Auto-encoders

2. Visual supervised learning tasks

24
Q

How do auto-encoders generate visual embeddings?

A

They generate low-dimensional representations of images with another path that re-creates the higher dimensional representation from that lower one. The model is optimized for recreating the original image as closely as possible. The path after the low-dimensional representation is then cut out and used as a feature vector.

25
Q

What kind of supervision do most text-based embedding models use?

A

Self-supervised learning

26
Q

How do visual supervised learning models generate visual embeddings?

A

The layer immediately before the softmax output layer is very low dimensionality and can be employed as a dense feature vector.

27
Q

How do you generate embeddings for pairs of entities, like users and videos on YouTube?

A

Build a two-tower neural net that generates embeddings for both entities in the same embedding space, and then produces random pairwise comparisons where it optimizes loss for similar entities to show up together and disjoint entities to show up apart.

28
Q

How do you do transfer learning with text data?

A

You can use an embedding generator as a feature extractor for text which you then go on to pass into another model like a spam classifier.

29
Q

What are the five steps in the initial building of a ML model?

A
  1. Identify the business problem and what you’re trying to accomplish
  2. Decide on one or more algorithms and techniques you’ll use to solve it
  3. Train the model on the given data and features
  4. Evaluate performance using identified metrics and iterate using hyperparameter searches
  5. Deploy the model as soon as it performs equal to or better than the current model
30
Q

What are three issues that can lead to initial online ML model deployments not performing as well as offline?

A
  1. Online/offline feature distribution drift
  2. Online/offline feature extraction disparity
  3. Under/overfitting