L9 - Systems for ML Flashcards
What are the three stages of ML ecosystem? (slide 4)
- Model development
- Training
- Inference
What performance metric is important for training?
throughput
What performance metric is important for interference?
latency
What are the two parts of model development?
- Data part
- Model part
What needs to be done in data part?
short: data collection, cleaning and visualisation
- identify sources of data
- join data from multiple sources
- clean data
- plot trends and anomalies
What resource is the bottleneck for preprocessing data?
CPU
Why do we care about performance of preprocessing?
- affects the end-to-end training time.
- consumes significant CPU and power
What needs to be done in model part?
short: feature engineering, model design; then training and validation
- build informative features
- design new model architectures
- tune hyperparameters
- validate prediction accuracy
What are deep neural networks?
neural networks with multiple hidden layers
What are the three steps of feature extraction and model search?
- feature extraction
- model search
- hyperparameter tuning
What are the three steps of DNN training?
- forward pass: compute activations and loss
- backward pass: compute gradients
- update model weights: to minimise loss
there are iterated over the training dataset
What are the characteristics of DNN training?
- computationally intensive
- error-tolerance
- large training datasets
What is meant under error tolerance?
trade-off some accuracy for large benefits in cost and/or throughput
How can be large training datasets be a problem?
How is this solved?
problem: Data often does not fit in memory
solution: overlap data fetching and training computation
How is single-node training done for DNNs?
- data preprocessing done on the CPU
- DNN training on GPUs/TPUs
What two types of parallelism can be achieved via distributed training?
- data parallelism
- model parallelism
What is data parallelism?
partition the data and run multiple copies of the model
synchronise weight updates
Common ways to implement data parallelism?
- parameter server
- AllReduce (allows GPUs to communicate faster with each other)
What is model parallelism?
partition the model across multiple nodes
note: more complicated