W3-Machine Learning Modeling Pipelines in Production Flashcards
What are the 2 basic ways to perform distributed training?
data parallelism and model parallelism.
How is data parallelism done?
Data parallelism is probably the easiest of the two to implement. In this approach, you divide the data into partitions. You copy the complete model to all of the workers, where each one operates on a different partition of the data and the model updates are synchronized across workers.
This type of parallelism is model agnostic and can be applied to any neural network architecture. Usually, the scale of data parallelism corresponds to the batch size.
How is model parallelism done?
In model parallelism, however, you segment the model into different parts, training concurrently on different workers. Each model will train on the same piece of data.
Here workers only need to synchronize the shared parameters, usually once for each forward or back propagation step
There are two basic styles of distributed training using data parallelism. Name them, give an example of each and say how they work?
synchronous, asynchronous\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
In synchronous training, each worker trains on its current mini batch of data, applies its own updates, communicates out its updates to the other workers. And waits to receive and apply all of the updates from the other workers before proceeding to the next mini batch. And all-reduce algorithm is an example of this.\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
In asynchronous training, all workers are independently training over their mini batch of data and updating variables asynchronously. Asynchronous training tends to be more efficient, but can be more difficult to implement. Parameter server algorithm is an example of this.
One major disadvantage of asynchronous training is ____ and ____.
____ may not be a problem, since the speed up in asynchronous training may be enough to compensate. However, the ____ may be an issue depending on ____ and ____.
Reduced accuracy
Slower convergence (which means that more steps are required to converge)
Slow convergence
Accuracy loss
How much accuracy is lost
The requirements of the application
To use distributed training, it’s not important that models become distribute-aware. True/False
False, to use distributed training, it’s important that models become distribute-aware
Fortunately, high-level APIs like Keras or Estimators support distributed training.
It requires a minimal amount of extra code to adapt your models for distributed training.
There are many different strategies for performing distributed training with TensorFlow. Name some
The following ones are the most used,
* one device strategy,
* mirrored strategy,
* parameter service strategy,
* multi-worker mirrored strategy,
* central storage strategy,
* TPU strategy.
Typical usage of this strategy is testing your code before
switching to other strategies that actually distribute your code.
Which method of data parallelism is this?
one device strategy
This strategy is typically used for training on one machine with multiple GPUs.
Which method of data parallelism is this?
Mirrored Strategy
Some machines are designated as workers and others as parameter servers.
Which method of data parallelism is this?
Parameter Server Strategy
By default, workers read and update these variables independently without synchronizing with each other. This is why sometimes parameter server style training is also referred to as asynchronous training.
____ are a key part of high-performance modeling, training, and inference, but they are also expensive, so it’s important to use them efficiently.
Accelerators
What are the differences and similarities between inference pipelines and input pipelines
GPT Summary
Inference pipelines are similar to input pipelines, but instead of being used to feed data to a model during training, they are used to feed new data to an already trained model in order to make predictions or perform other inference tasks.
Like input pipelines, the goal of inference pipelines is to efficiently utilize available hardware resources and minimize the time required to load and process data. Typically, the same issues around pre-processing data and efficiently using compute resources apply to both input and inference pipelines.
Input pipelines are an essential part of training pipelines and inference pipelines, and____ is one framework that can be used to design an efficient input pipeline
GPT Summary
TensorFlow data or tf.data
Why are input pipelines used?
GPT Summary
In order to use accelerators efficiently and reduce the time required to load and pre-process data, input pipelines are used, which are an important part of many training and inference pipelines.
By optimizing pipelines, model training time can be significantly reduced, and resource utilization can be greatly improved.
There is a pressing need for an efficient, scalable infrastructure that can enable large-scale deep learning and overcome the memory limitations on current accelerators. True/False
GPT Summary
True
larger models, though effective, are limited by hardware constraints, especially memory limitations on current accelerators, and the gap between model growth and hardware improvement has increased the importance of parallelism.