General Deep Learning and Machine Learning Flashcards
What is the Vanishing Gradient Problem?
On Back-Propagation, the the value for earlier layers depends on the diferential of the activation unit for all posterior layers. This means that if those values are all smaller than 1, the value for the eartlier layers tends to 0.
What is the most common activation unit nowadays and why?
ReLU, because it helps solve the vanishing gradient problem
What are some ReLU variations?
-ELU (Exponential linear unit)
-Leaky ReLU (Gradient is not 0 for values smaller than 0)
-Swish (similar to ReLU, but more smooth, good for very deep network)
-Maxout
-PReLU
What does the Softmax function do?
Output probabilities based on a group of input values
True or False: You should always start with ReLU activation functions for all Neural networks
False, RNN tends to perform well with TanH
What are CNN useful for?
For cases where data cannot be easily described by columns, since they can find features that are not in a specific spot
What is an example of a classic use for CNNs?
Image treatmentW
What are some examples of CNN layers and what do they do?
-Conv2D: Performs the actual convolution
-MaxPooling2D: Gets the max values from all near inputs
-Flatten: Converts 2D layer to 1D layer
What are some famous CNN models?
-LeNet5
-AlexNet
-GoogLeNet
-ResNet
What are the uses of RNN models?
Predicting future behaviour based on past behaviour
What are the existing RNN topologies?
-Sequence to Sequence: Sequence of Value to predict Sequence of values
-Sequence to Vector: Sequence of Value to predict a single Vector
-Vector to Sequence: Vector to predict sequence of Values
-Encoder->Decoder: Sequence -> vector -> sequence
True or False: Performing Back-propagation through time has no significant impact in training performance, with the number of time steps being irrelevant for training speed
False, training with many time steps make a RNN similar to a very deep neural network
Whats are some RNN cells?
LSTM cell: Mantains long-term and short-term memory cells
GRU cell: Simplified LSTM that performs about as well
What are some EC2/EMR instances appropriate for deep learning?
-P3
-P2
-G5
-G5g (also used for Android game streaming)
-Trn1 (optimized for training)
-Trn1n (more bandwidth than Trn1)
-Inf2 (Powered by AWS Inferentia, optimized for inference)
What are the Pros and Cons of Small and Large learning rates
Large learning rates train faster, but can overshoot the correct solution. Small learning rates don’t have that problem, but are slower.
Should you use small or large batch sizes when training?
Small batch sizes, because they tend to not get stuck on local minima and because large batch sizes can converge on the wrong solution at random
What is the point of regularization techniques?
To prevent overfitting
What are some techniques to prevent vanishing/exploding gradient?
-Usage of better activation functions (ReLU)
-Multi-level hierarchy (Training Multiple Sub-models instead of a large one)
-Long-short term memory (LSTM)
-Residual networks (Ensemble of smaller networks)
What is the difference between L1 and L2 regularization in mathematical terms?
In both cases, terms are added to the loss function being minimized. In L1, however, what is added is the sum of the weights being adjusted, while on L2 the sum of the square of the weights is added.
What are the comparative advantages and disadvantages between L1 and L2 regularization?
L2 regularization is more computationally efficient and returns a dense network, only reducing irrelevant feature weights, while L1 is less efficient and returns sparse networks, reducing weights to 0. The main advantage of this is that it can be use for feature selection, making it possible to remove irrelevant features. If all features are relevant, however, L2 tends to be better.
What are some synonyms for Recall?
-Sensitivity
-True Positive rate
-Completeness
What are some synonyms for precision?
-Correct Positives rate
What is the True Negative rate?
It is the same as the recall, but calculated for false values
How do you calculate F1?
(2 x Precision x Recall) / (Precision + Recall)