Deep Learning Flashcards

1
Q

Why do we need non-linear activation functions?

A
  1. otherwise the whole network will collapse into a linear model.
  2. helps to capture non-linear complexities.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is ReLU a better activation function than sigmoid?

A

in sigmoid graphs flattens out for large values of x thus causing vanishing gradients. ReLU prevents this from happening.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two problems with gradient descent?

A
  1. local optima
  2. saddle points: gradients is zero but is not local optimum
  3. efficiency: can be solved using stochastic gradient descent.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the main differences between CNNs and fully connected MLPs?

A

CNNs:
1. sparse connections:
outputs are connected to inputs within receptive field; less
weights; less overfitting
2. weight sharing:
less overfitting, regularization
3. location invariant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly