batch normalization Flashcards

Question 1

Q

why we should do BN

Answer

A

when there r many layers, input for later layers may be shifted too much –> keep the inputs kinda stable throughout the training
–> speed up learning
BN also has a small regularization effect

Question 2

Q

BN process?

Answer

A

calculate mean and std dev of the input batch
znorm = z-mu/sqrt(stddev^2+epsilon)
z~= gamma×znorm + beta
–> use z~ instead of z
gamma and beta are learnable and only used for that particular batch

Question 3

Q

why gamma and beta should be learnable?

Answer

A

we might not a fixed distribution of inputs sometimes (like std normal dist of inputs be4 going thru sigmoid)
–> gamma and beta help control the distribution of input

Question 4

Q

does it make sense to use b (bias) while using BN

Answer

A

no. adding a constant to z will be canceled out when calculating znorm anyway –> no point

Question 5

Q

why BN has regularization effect?

Answer

A

gamma and beta are customized for a particular minibatch –> introducing some noise to the training

Question 6

Q

BN at test time?

Answer

A

use exponentially weighted avg to get the values of gamma and beta to use at test time

batch normalization Flashcards

(6 cards)