Lecture 6 Flashcards

1
Q

When is a model not a deep learning one?

A

When the input unit is directly connected to the output, without any layers (of transformation) between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four common activation functions?

A
  1. Sigmoid
  2. ReLU
  3. Softplus
  4. Tanh
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the Sigmoid activation function do?

A

Ensures that the output is bound between 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the ReLU activation function do?

A

Enures the output is bound between 0 and max, so there is no negative output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the problems with the ReLU function?

A

Not continuous, and calculating derivative is challenging and slow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the Softplus activation function do?

A

It is a smooth, continuous version of ReLU.

When negative values get close to zero, the function begins smoothly ascending.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the Tanh activation function do?

A

Ensures the output is bound between -1 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is Tanh (activation function) useful?

A

When you want your network to be insensitive to numbers farther from zero, and more sensitive to numbers closer to zero.

Has a really good decision factor for when the model knows how close it is to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hidden layer

A

Any layer in a Neural Network wherein units are not directly connected to the outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Uses of Gradient Descent

A
  • Learning the weights in computation graphs
  • for calculating weights leading into units in the output layer
  • for calculating weights leading into units in the hidden layers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Loss function?

A

(y - ŷ)2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Vanishing Gradient

A

When the “error” signals are extinguished altogether as they are back-propogated through a deep network with many layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does a Vanishing Gradient occur?

A
  • The gradient is very close to zero or exactly zero, so changing the weights leading into unit j has a negligible effect on its output.
  • Floating point precision limitations of the hardware
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you solve the Vanishing Gradient problem?

A

By renormalizing the vector of gradients as you propogate it back through the network, so you ‘boost’ the gradients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the implications of a Vanishing Gradient?

A

The network will eventually freeze and no longer improve, even if there are more layers that need to have the loss back-propagated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Gradient

A

Percentage of loss every node receives and wants to feed backward

17
Q

Softmax

A

Softmax function takes a vector of values as input, and outputs a vector of non-negative numbers that sum to 1