Deep Learning Flashcards
Sigmoid Activation Function
Squashes numbers between [0, 1].
Advantage is it has a nice interpretation (firing rate of neuron).
Disadvantages are (1) saturated neurons kills gradient (2) output is not zero centered, leading to inefficient steps (3) computationally expensive.
Very rarely used anymore.
Tanh(x) Activation Function
Zero centered, but still kills gradients.
Generally better than sigmoid
ReLU (Rectified Linear Unit)
max(0, x)
Does not saturate in positive region, computationally efficient, converges faster than sigmoid and tanh
Can still die.
Leaky ReLU
Strictly better than ReLU by solving dying neuron problem.
Does not saturate, computationally inefficient, and still converted faster than sigmoid and tanh.
Preprocessing Data techniques
Zero center data, normalize (not necessary for images)
Methods for model regularization
Dropout
Batch Norm
Data Augmentation
DropConnect
Fractional Max Pooling
Stochastic Depth
Generally want to use batch norm to start off with, add in some others if you see overfitting of data.
Dropout
The term “dropout” refers to dropping out the nodes (input and hidden layer) in a neural network (as seen in Figure 1). All the forward and backwards connections with a dropped node are temporarily removed, thus creating a new network architecture out of the parent network. The nodes are dropped by a dropout probability of p.
Generally, for the input layers, the keep probability, i.e. 1- drop probability, is closer to 1, 0.8 being the best as suggested by the authors. For the hidden layers, the greater the drop probability more sparse the model, where 0.5 is the most optimised keep probability, that states dropping 50% of the nodes.
Batch Norm
Batch Normalization is extension of concept of normalization from just the input layer to the activations of each hidden layer throughout the neural network. By normalizing the activations of each layer, Batch Normalization helps to alleviate the internal covariate shift problem, which can hinder the convergence of the network during training.
Data augmentation
Data augmentation is the process of artificially generating new data from existing data, primarily to train new machine learning (ML) models.