Deep Learning Flashcards
1
Q
A
2
Q
what is stochastic in SGD?
A
batches, initial weights
3
Q
what is momentum doing?
A
makes sgd more stable, m=0.99m+0.01gradient
4
Q
why is relu so popular?
A
fast to compute, no vanishing gradients