7. Model Building Flashcards
What is data parallelism?
Data parallelism is when the dataset is split into parts and then assigned to parallel computational machines or graphics processing units (GPUs). A small batch of data is sent to every node, and the gradient is computed normally and sent back to the main node.
What are the two strategies in data parallelism?
Synchronous Training: The model sends different parts of the data into each accelerator or GPU. Every GPU has a complete copy of the model and is trained solely on a part of the data.
Asynchronous Training: workers don’t have to wait for each other and all workers are independently training over the input data and updating variables asynchronously.
What is “all-reduce sync” strategy good for?
It is great for Tensor Processing Unit (TPU) and one‐machine multi‐GPUs.
What is tf.distribute.Strategy used for?
It is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs.
What are the TensorFlow distributed training strategies?
MirroredStrategy: Synchronous distributed training on multiple GPUs on one machine.
CentralStorageStrategy: Synchronous training but no mirroring meaning model variables are kept in the CPU.
MultiWorkerMirroredStrategy: Synchronous distributed training across multiple workers, each with potentially multiple GPUs or multiple machines.
TPUStrategy: Synchronous distributed training on multiple TPU cores.
ParameterServerStrategy: Some machines are designated as workers and some as parameter servers.
Hints: Monkeys Climb More Than Pandas.
What is model parallelism?
In model parallelism, every model is partitioned into parts, just as with data parallelism. Each model is then placed on an individual GPU. It can be used to train a model such that it does not fit into just a single GPU.
What is the best architecture for synchronous distribution training for TensorFlow?
All-reduce architecture
What is the best architecture for asynchronous distribution training for TensorFlow?
Parameter server architecture
What are the tools for deploying TensorFlow models?
tf.serving
TFLite for mobile devices
TensorFlow.js for browsers
What is Convolutional Neural networks usually used for?
Image classification
What is Recurrent Neural networks usually used for?
It is designed to operate upon sequences of data. It can be used for text classification or prediction of values in the sequence, e.g., long short-term memory network. It can also be used for time-series and speech recognition.
What do you use to train a neural network?
Stochastic gradient descent
What is the goal of training a neuron network?
To find a set of weights and biases that have low loss
What is loss in neuron network used for?
The loss is used to calculate the gradients.
Gradients are used to update the weights of the neural network.
What are the outputs of regression, binary classification and multiclass classification?
Numerical
Binary
Single label multclass
What are the activation functions of regression, binary classification and multiclass classification?
One node with a linear activation unit
Sigmoid activation unit
Softmax activation function
What are the loss functions of regression, binary classification and multiclass classification?
MSE
Binary cross-entropy, categorical hinge loss and squared hinge loss (Keras)
Categorical cross-entropy and sparse categorical cross-entropy
When do you use sparse categorical cross-entropy and categorical cross-entropy?
Use sparse categorical cross‐entropy when your classes are mutually exclusive and categorical cross‐entropy when one sample can have multiple classes or labels.