Deep Learning Flashcards
1
Q
Why do we need non-linear activation functions?
A
- otherwise the whole network will collapse into a linear model.
- helps to capture non-linear complexities.
2
Q
Is ReLU a better activation function than sigmoid?
A
in sigmoid graphs flattens out for large values of x thus causing vanishing gradients. ReLU prevents this from happening.
3
Q
What are the two problems with gradient descent?
A
- local optima
- saddle points: gradients is zero but is not local optimum
- efficiency: can be solved using stochastic gradient descent.
4
Q
What are the main differences between CNNs and fully connected MLPs?
A
CNNs:
1. sparse connections:
outputs are connected to inputs within receptive field; less
weights; less overfitting
2. weight sharing:
less overfitting, regularization
3. location invariant