DL-01 - Introduction (+ impl) Flashcards
DL-01a - Introduction
Who was behind Alexnet? (1 + 2)
Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton.
DL-01a - Introduction
What’s the formula for the sigmoid function?
(See image)
DL-01a - Introduction
What formula is this? (See image)
Sigmoid
DL-01a - Introduction
Why is the sigmoid activation function less used these days? (2)
- Vanishing gradient problem
- Non-zero centered output (range 0-1)
DL-01a - Introduction
Why is the non-zero centered output of a sigmoid function a problem?
It causes zig-zagging dynamics during optimization.
DL-01a - Introduction
What is one cause of vanishing gradients with the sigmoid function?
Saturation behavior at either end of the tails.
DL-01a - Introduction
What formula is this? (See image)
Tanh.
DL-01a - Introduction
What’s the formula for tanh?
(See image)
DL-01a - Introduction
How is tanh related to sigmoid?
DL-01a - Introduction
How does tanh and sigmoid compare? (3)
- Tanh solves the “zero centered” problem.
- Both gradients still saturate.
- Tanh is generally preferred over sigmoid.
DL-01a - Introduction
What are the pros of the ReLU function? (4)
- No vanishing gradient
- Fast convergence
- Simple implementation
- Better convergence performance than sigmoid
DL-01a - Introduction
What are the cons of the ReLU function? (1)
“Dying ReLU” problem with large gradient flows.
DL-01a - Introduction
What is parametric/leaky ReLU?
ReLU with slope in negative region. (See image)
- Parametric: Has learnable param.
- Leaky: Pre-defined param.
DL-01a - Introduction
What’s the formula for PReLU? Is alpha a param or a hyperparam?
Alpha is a param and is learned during training.
(See image)
DL-01a - Introduction
What’s the formula for leaky ReLU? Is alpha a param or a hyperparam?
Alpha is a hyperparam, typically fixed at 0.01.
(See image)