Lecture 6 Flashcards

Question 1

Q

When is a model not a deep learning one?

Answer

A

When the input unit is directly connected to the output, without any layers (of transformation) between them.

Question 2

Q

What are the four common activation functions?

Answer

A

Sigmoid
ReLU
Softplus
Tanh

Question 3

Q

What does the Sigmoid activation function do?

Answer

A

Ensures that the output is bound between 0 and 1.

Question 4

Q

What does the ReLU activation function do?

Answer

A

Enures the output is bound between 0 and max, so there is no negative output.

Question 5

Q

What are the problems with the ReLU function?

Answer

A

Not continuous, and calculating derivative is challenging and slow.

Question 6

Q

What does the Softplus activation function do?

Answer

A

It is a smooth, continuous version of ReLU.

When negative values get close to zero, the function begins smoothly ascending.

Question 7

Q

What does the Tanh activation function do?

Answer

A

Ensures the output is bound between -1 and 1.

Question 8

Q

When is Tanh (activation function) useful?

Answer

A

When you want your network to be insensitive to numbers farther from zero, and more sensitive to numbers closer to zero.

Has a really good decision factor for when the model knows how close it is to 0.

Question 9

Q

Hidden layer

Answer

A

Any layer in a Neural Network wherein units are not directly connected to the outputs

Question 10

Q

Uses of Gradient Descent

Answer

A

Learning the weights in computation graphs
for calculating weights leading into units in the output layer
for calculating weights leading into units in the hidden layers

Question 11

Q

What is the Loss function?

Answer

A

(y - ŷ)²

Question 12

Q

Vanishing Gradient

Answer

A

When the “error” signals are extinguished altogether as they are back-propogated through a deep network with many layers

Question 13

Q

How does a Vanishing Gradient occur?

Answer

A

The gradient is very close to zero or exactly zero, so changing the weights leading into unit j has a negligible effect on its output.
Floating point precision limitations of the hardware

Question 14

Q

How do you solve the Vanishing Gradient problem?

Answer

A

By renormalizing the vector of gradients as you propogate it back through the network, so you ‘boost’ the gradients.

Question 15

Q

What are the implications of a Vanishing Gradient?

Answer

A

The network will eventually freeze and no longer improve, even if there are more layers that need to have the loss back-propagated.

Question 16

Q

Gradient

Answer

Study These Flashcards

A

Percentage of loss every node receives and wants to feed backward

Question 17

Q

Softmax

Answer

Study These Flashcards

A

Softmax function takes a vector of values as input, and outputs a vector of non-negative numbers that sum to 1

Lecture 6 Flashcards

(17 cards)