Production Machine Learning systems Flashcards

1
Q

What is a concept drift?

A

It is a change in the relationship between input and output of models. It doesn’t have to be necesseraly connected to data drift, it can be influenced by many different things like hidden context - ex. user behavior has changed over time due to influence of economy strength which is not visible in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What types of concept drift exist?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are two main types of distributed training architecutres?

A

Data parallelism - split the training data between multiple worker nodes
Model parallelism - as the model can’t fit in a memory, split the model but use the same data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are 2 common data parallelism model approaches? Explain in detail and when to use them.

A
  • Synchonous AllReduce architecture - each worker node is working with a subset of mini-batch used for training. Workers are splitting the workload but they have to be in-sync, waiting for others to finish their part and then proceed with the next mini-batch. They are good for dense models with a lot of features, ex. BERT
  • Asynchronous parameter server architecture - nodes are split between worker and parameter server nodes. Nodes are not in sync, each worked node is taking a mini-batch and the latest parameters provided from parameter server nodes. When training is completed for a mini-batch on a worker nodes, paramteres are updated. This architecture is more fitting for sparse models, models with not too many features, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is model parallelism?

A

It is a distributed training architecture where model is split in layers and each layer is used for training on the same mini-batch and communicating with other layers. They have to be in-sync. This is used when model is too big and it can’t fit in a memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are 4 types of Tensorflow distirbuted training strategies?

A
  • Mirrored strategy
  • Multi-worker mirrored strategy
  • TPU strategy
  • Parameter server strategy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How Mirrored strategy for distributed training works?

A

It is used when there is a single machine with multiple GPUs. Model is replicated on each GPU and mini-batch size is split based on number of GPUs. Parameters have to be in sync across GPUs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How Multi-worker Mirrored strategy for distributed training works?

A

Almost the same like for Mirrored strategy, the only difference is that there are now multiple machines that have multiple CPUs or GPUs and all of them are splitting the mini-batch. You need to define which machine is a chief (master) and which machines are worker nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How TPU strategy for distributed training works?

A

The same like for Mirrored strategy, only difference is that workload is split between TPU cores. This strategy is optimised for biggest workloads and main consideration is to make sure that there is enough data that can be used and models are not sitting stale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly