regularization Flashcards

Question 1

Q

why regularization

Answer

A

prevent overfitting

Question 2

Q

how l1 l2 regularization prevent overfitting

Answer

A

shrinks the coefficient (w) towards 0 –> discourage a more complex model

Question 3

Q

what is ridge regression (L2 regularization)

Answer

A

Loss + lambda × sum||w[l]||^2

Question 4

Q

what is lasso regression (L1 regularization)

Answer

A

loss + lambda × sum(||w||)

Question 5

Q

what is the difference between L1 and L2 (lasso vs ridge)?
why l1 better? why l2 better?

Answer

A

ridge regression coefficient estimates will be exclusively non 0 (might turning some coeffs to almost but never 0) while lasso coefficients can be 0 (many coeffs can be 0 simultaneously) –> lasso also does feature selection and yields a sparse model.
l2 puts extra penalty fo large weights (coz its square) and its span of evaluating the strength of penalty is larger
l1 is more time efficient

Question 6

Q

what does regularization achieve?

Answer

A

reduces variance without significantly increase bias

Question 7

Q

how to select lambda

Answer

A

as lambda increase, it will reduce variance but after sometime it will start losing important properties in the data –> finding the optimal lambda (high enough)

Question 8

Q

what is dropout regularization

Answer

A

during training, some layer outputs are dropped out at random–> different number of nodes and connections –> requires nodes within layers to take more or less responsibility for the input (some will have to learn more)

Question 9

Q

why dropout helps overfitting?!

Answer

A

it cant rely on 1 input because it might be randomly dropped out
neurons will not learn redundant details of inputs

Question 10

Q

how to choose other hyperparams when using dropout?

Answer

A

high learning rate: tgt with dropout noise may help explore more area of loss function and find a better minimum

Question 11

Q

drawbacks of dropout

Answer

A

takes 2-3 times longer to train the nn

Question 12

Q

where to put dropout layer?

Answer

A

after fully connected layers, not convolutional layers as conv layers alr have fewer params -> they need less regularisation
other regularization techniques such as batch normalization in CNN have overtaken dropout

Question 13

Q

what is data augmentation

Answer

A

a regularization technique: modify the training data to create new data (randomly crop, rotate, …)

Question 14

Q

what is early stopping

Answer

A

a regularization techinque: stop training early before it overfits

Question 15

Q

what is the drawback of early stopping

Answer

A

cannot separate the tasks of optimize cost fct and not overfit, always have to consider both while if using L2 can train as long as poss, just focus on keep training

regularization Flashcards

(15 cards)