Deep Learning Flashcards

Question 1

Q

Why do we need non-linear activation functions?

Answer

A

otherwise the whole network will collapse into a linear model.
helps to capture non-linear complexities.

Question 2

Q

Is ReLU a better activation function than sigmoid?

Answer

A

in sigmoid graphs flattens out for large values of x thus causing vanishing gradients. ReLU prevents this from happening.

Question 3

Q

What are the two problems with gradient descent?

Answer

A

local optima
saddle points: gradients is zero but is not local optimum
efficiency: can be solved using stochastic gradient descent.

Question 4

Q

What are the main differences between CNNs and fully connected MLPs?

Answer

A

CNNs:
1. sparse connections:
outputs are connected to inputs within receptive field; less
weights; less overfitting
2. weight sharing:
less overfitting, regularization
3. location invariant

Deep Learning Flashcards

(4 cards)