6. DEEP LEARNING ON EDGE Flashcards
15 Questions
Which of the following condition is apt for Pruning?
a. The histogram of the weights centered around zeros
b. A bi-modal Gaussian distributed histogram
c. A Poisson distributed histogram
d. None of the above
The histogram of the weights centered around zeros
Which of the following is a representation of linear symmetric quantization? (x is input, x_int is output, s is scale, z is zero point, b is number of bits
a. x_int = clamp(round(x/s) + z , 0, 2^b – 1)
b. x_int = clamp(round(x/s) + z , -2^b -1, 2^b – 1)
c. x_int = clamp(round(x/s) , 0, 2^b – 1)
d. None of the above
c. x_int = clamp(round(x/s) , 0, 2^b – 1)
What is a neural network distillation?
a. The learned teacher model is Pruned to form a student model
b. The learned teacher model is Quantized to form a student model
c. Learning a student model with low complexity from a teacher model
d. None of the above
Learning a student model with low complexity from a teacher model
Pruning of neural networks can be done in which of the following ways
a. Removing some of the weights
b. Reducing the bit-resolution of the weights
c. Reducing the spatial resolution of the input
d. All the possibilities
Removing some of the weights
Which of the following cases is not useful for pruning of neural networks?
a. A very deep neural network
b. Pruning works in all cases
c. A recursive neural network
d. A modular network with the output of one neural network is input to the next
A modular network with the output of one neural network is input to the next
How to make rounding error smaller?
a. Decrease q_max
b. Increase q_max
c. Decrease q_min
d. All of the above
Decrease q_max
How do we find the scale factor and zero point of inputs/activations?
a. Not possible
b. Run representative mini-batches
c. Derive from theory
d. Predict from the trained model
Run representative mini-batches
What will be the scale factor for a data with minimum value of -50 and maximum value of 150 and for 8-bit resolution.
a. 200
b. 0.7843
c. 0.0039
d. None of the above
0.7843
What will be the dequantization function for a non-linear quantization (e.g. weight sharing)
a. x’ = s (x_int – z)
b. x’ = s*x_int
c. Needs a look up table (or code book)
d. Not available
Needs a look up table (or code book)
What will be the starting point for weight sharing by combining Pruning with Quantization?
a. k-Means based clustering of weights
b. Huffman Coding of Weights
c. Quantization of Weights
d. Pruning of Weights
k-Means based clustering of weights
What will be the typical shape of a histogram after Pruning process?
a. A Poisson distributed histogram
b. A normal distributed histogram
c. A bi-modal Gaussian distributed histogram
d. Any of the above
A bi-modal Gaussian distributed histogram
Which of the following has the highest complexity of implementation?
a. Symmetric Signed Quantization
b. Symmetric Unsigned Quantization
c. Asymmetric Unsigned Quantization
d. All the above are equal in complexity
Asymmetric Unsigned Quantization
What will be effective bit size for a quantization with level set {-1, 0, 1}?
a. 3 bits
b. 1.58 bits
c. 1 bit
d. 2 bits
1.58 bits
Which of the following is useful for implementing a deep neural network on the edge device?
a. Higher number of parameters
b. Neural network weights represented by 32 bit fixed point representation
c. None of the answers
d. Both the higher parameters and 32 bit fixed point representation
None of the answers
Which of the below case provides the best results?
a. L2 regularization without retrain
b. L1 regularization without retrain
c. L1 regularization with retrain
d. L2 regularization with iterative retraining
L2 regularization with iterative retraining