DL-01 - Introduction (+ impl) Flashcards

Question 1

Q

DL-01a - Introduction

Who was behind Alexnet? (1 + 2)

Answer

A

Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton.

Question 2

Q

DL-01a - Introduction

What’s the formula for the sigmoid function?

Answer

A

(See image)

Question 3

Q

DL-01a - Introduction

What formula is this? (See image)

Question 4

Q

DL-01a - Introduction

Why is the sigmoid activation function less used these days? (2)

Answer

A

Vanishing gradient problem
Non-zero centered output (range 0-1)

Question 5

Q

DL-01a - Introduction

Why is the non-zero centered output of a sigmoid function a problem?

Answer

A

It causes zig-zagging dynamics during optimization.

Question 6

Q

DL-01a - Introduction

What is one cause of vanishing gradients with the sigmoid function?

Answer

A

Saturation behavior at either end of the tails.

Question 7

Q

DL-01a - Introduction

What formula is this? (See image)

Question 8

Q

DL-01a - Introduction

What’s the formula for tanh?

Answer

A

(See image)

Question 9

Q

DL-01a - Introduction

How is tanh related to sigmoid?

Question 10

Q

DL-01a - Introduction

How does tanh and sigmoid compare? (3)

Answer

A

Tanh solves the “zero centered” problem.
Both gradients still saturate.
Tanh is generally preferred over sigmoid.

Question 11

Q

DL-01a - Introduction

What are the pros of the ReLU function? (4)

Answer

A

No vanishing gradient
Fast convergence
Simple implementation
Better convergence performance than sigmoid

Question 12

Q

DL-01a - Introduction

What are the cons of the ReLU function? (1)

Answer

A

“Dying ReLU” problem with large gradient flows.

Question 13

Q

DL-01a - Introduction

What is parametric/leaky ReLU?

Answer

A

ReLU with slope in negative region. (See image)

Parametric: Has learnable param.
Leaky: Pre-defined param.

Question 14

Q

DL-01a - Introduction

What’s the formula for PReLU? Is alpha a param or a hyperparam?

Answer

A

Alpha is a param and is learned during training.

(See image)

Question 15

Q

DL-01a - Introduction

What’s the formula for leaky ReLU? Is alpha a param or a hyperparam?

Answer

A

Alpha is a hyperparam, typically fixed at 0.01.

(See image)

Question 16

Q

DL-01a - Introduction

What activation function is this? (See image)

Answer

Study These Flashcards

A

Softmax

Question 17

Q

DL-01a - Introduction

What’s the formula for softmax?

Answer

Study These Flashcards

A

(See image)

Question 18

Q

DL-01a - Introduction

How is momentum used in gradient descent?

Answer

Study These Flashcards

A

Momentum is used in gradient descent to accelerate convergence by adding a fraction of the previous update to the current update.

Question 19

Q

DL-01a - Introduction

What does momentum help with?

Answer

Study These Flashcards

A

Overcoming local minima and oscillations.

Question 20

Q

DL-01a - Introduction

How are momentum and Nesterov different?

Answer

Study These Flashcards

A

Momentum speeds up gradient descent
Nesterov Momentum adds a corrective factor for better approximation.

Question 21

Q

DL-01a - Introduction

What is RMSprop?

Answer

Study These Flashcards

A

RMSprop is an adaptive learning rate optimization algorithm for gradient descent in deep learning models.

Question 22

Q

DL-01a - Introduction

How does RMSprop work?

Answer

Study These Flashcards

A

RMSprop works by adapting the learning rate for each weight based on the magnitudes of their gradients, using a running average of squared gradients.

Question 23

Q

DL-01a - Introduction

How does ADAM work?

Answer

Study These Flashcards

A

The ADAM optimizer works by adaptively adjusting the learning rate for each parameter using the first and second moments of the gradients.

DL-01 - Introduction (+ impl) Flashcards

(23 cards)