batch normalization Flashcards

1
Q

why we should do BN

A

when there r many layers, input for later layers may be shifted too much –> keep the inputs kinda stable throughout the training
–> speed up learning
BN also has a small regularization effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

BN process?

A

calculate mean and std dev of the input batch
znorm = z-mu/sqrt(stddev^2+epsilon)
z~= gamma×znorm + beta
–> use z~ instead of z
gamma and beta are learnable and only used for that particular batch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

why gamma and beta should be learnable?

A

we might not a fixed distribution of inputs sometimes (like std normal dist of inputs be4 going thru sigmoid)
–> gamma and beta help control the distribution of input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

does it make sense to use b (bias) while using BN

A

no. adding a constant to z will be canceled out when calculating znorm anyway –> no point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

why BN has regularization effect?

A

gamma and beta are customized for a particular minibatch –> introducing some noise to the training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

BN at test time?

A

use exponentially weighted avg to get the values of gamma and beta to use at test time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly