Deep Learning Flashcards

Question 1

Q

Sigmoid Activation Function

Answer

A

Squashes numbers between [0, 1].

Advantage is it has a nice interpretation (firing rate of neuron).

Disadvantages are (1) saturated neurons kills gradient (2) output is not zero centered, leading to inefficient steps (3) computationally expensive.

Very rarely used anymore.

Question 2

Q

Tanh(x) Activation Function

Answer

A

Zero centered, but still kills gradients.

Generally better than sigmoid

Question 3

Q

ReLU (Rectified Linear Unit)

Answer

A

max(0, x)

Does not saturate in positive region, computationally efficient, converges faster than sigmoid and tanh

Can still die.

Question 4

Q

Leaky ReLU

Answer

A

Strictly better than ReLU by solving dying neuron problem.

Does not saturate, computationally inefficient, and still converted faster than sigmoid and tanh.

Question 5

Q

Preprocessing Data techniques

Answer

A

Zero center data, normalize (not necessary for images)

Question 6

Q

Methods for model regularization

Answer

A

Dropout
Batch Norm
Data Augmentation
DropConnect
Fractional Max Pooling
Stochastic Depth

Generally want to use batch norm to start off with, add in some others if you see overfitting of data.

Question 7

Q

Dropout

Answer

A

The term “dropout” refers to dropping out the nodes (input and hidden layer) in a neural network (as seen in Figure 1). All the forward and backwards connections with a dropped node are temporarily removed, thus creating a new network architecture out of the parent network. The nodes are dropped by a dropout probability of p.

Generally, for the input layers, the keep probability, i.e. 1- drop probability, is closer to 1, 0.8 being the best as suggested by the authors. For the hidden layers, the greater the drop probability more sparse the model, where 0.5 is the most optimised keep probability, that states dropping 50% of the nodes.

Question 8

Q

Batch Norm

Answer

A

Batch Normalization is extension of concept of normalization from just the input layer to the activations of each hidden layer throughout the neural network. By normalizing the activations of each layer, Batch Normalization helps to alleviate the internal covariate shift problem, which can hinder the convergence of the network during training.

Question 9

Q

Data augmentation

Answer

A

Data augmentation is the process of artificially generating new data from existing data, primarily to train new machine learning (ML) models.

Deep Learning Flashcards

(9 cards)