C4 Flashcards

Question 1

Q

exploding/vanishing gradients

Answer

A

the deeper you go in the network, the more multiplications you have: around 2xdepth multiplications

products of small numbers are very small, products of big numbers are very big

solution: alternative activation functions

Question 2

Q

alternative activation functions

Answer

A

Logistic sigmoid
TanH
Linear identity
ReLu (Rectified Linear Unit)
LReLU (Leaky Rectified Linear Unit)
ELU (Exponential Linear Unit)
SELU (Scaled Exponential Linear Unit)

Question 3

Q

batch normalization

Answer

A

When training a network with batches of data the network “gets confused” by the fact that statistical properties of batches vary from batch to batch

Idea 1: normalize each batch => subtract the mean and divide by the std deviation

Idea 2: assume that it is beneficial to scale and to shift each batch by a certain gamma and beta, to minimize network loss (error) on the whole training set

Idea 3: Finding optimal gamma and beta can be achieved with SGD (gradient descent)

Batch Normalization allows higher learning rates, reducing the number of epochs; consequently, it is much faster than other training algorithms

Question 4

Q

advantages Batch Normalization

Answer

A

superior accuracy
reduces risk of vanishing/exploding gradients
much faster than traditional backpropagation
allows for using “big learning rates” => less epochs needed for convergence
allows for training much deeper networks
increases regularization: lower risk of overfitting

Question 5

Q

regularization

Answer

A

add additional mechanism to prevent overfitting
L1 or L2 regularization: penalty on too big values of weights

C4 Flashcards

(5 cards)