regularization Flashcards

1
Q

why regularization

A

prevent overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how l1 l2 regularization prevent overfitting

A

shrinks the coefficient (w) towards 0 –> discourage a more complex model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is ridge regression (L2 regularization)

A

Loss + lambda × sum||w[l]||^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is lasso regression (L1 regularization)

A

loss + lambda × sum(||w||)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the difference between L1 and L2 (lasso vs ridge)?
why l1 better? why l2 better?

A

ridge regression coefficient estimates will be exclusively non 0 (might turning some coeffs to almost but never 0) while lasso coefficients can be 0 (many coeffs can be 0 simultaneously) –> lasso also does feature selection and yields a sparse model.
l2 puts extra penalty fo large weights (coz its square) and its span of evaluating the strength of penalty is larger
l1 is more time efficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what does regularization achieve?

A

reduces variance without significantly increase bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how to select lambda

A

as lambda increase, it will reduce variance but after sometime it will start losing important properties in the data –> finding the optimal lambda (high enough)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is dropout regularization

A

during training, some layer outputs are dropped out at random–> different number of nodes and connections –> requires nodes within layers to take more or less responsibility for the input (some will have to learn more)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

why dropout helps overfitting?!

A

it cant rely on 1 input because it might be randomly dropped out
neurons will not learn redundant details of inputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how to choose other hyperparams when using dropout?

A

high learning rate: tgt with dropout noise may help explore more area of loss function and find a better minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

drawbacks of dropout

A

takes 2-3 times longer to train the nn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

where to put dropout layer?

A

after fully connected layers, not convolutional layers as conv layers alr have fewer params -> they need less regularisation
other regularization techniques such as batch normalization in CNN have overtaken dropout

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is data augmentation

A

a regularization technique: modify the training data to create new data (randomly crop, rotate, …)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is early stopping

A

a regularization techinque: stop training early before it overfits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the drawback of early stopping

A

cannot separate the tasks of optimize cost fct and not overfit, always have to consider both while if using L2 can train as long as poss, just focus on keep training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly