College 2 Flashcards

1
Q
In the AND problem what is the value of Y?
x1 = 0, x2 = 0
x1 = 1,  x2 = 0
x1 = 0, x2 = 1
x1 = 1,  x2 = 1
A
y = 0
y = 0
y = 0
y = 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
In the XOR problem what is the value of Y?
x1 = 0, x2 = 0
x1 = 1,  x2 = 0
x1 = 0, x2 = 1
x1 = 1,  x2 = 1
A
y = 0
y = 1
y = 1
y = 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can the perceptron solve the xor problem?

A

Two perceptrons combined can.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define: dense / fully-connected layer

A

A linear operation in which every input is connected to every output by a weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define: loss function

A

Also called Error or Cost. Calculates the ‘cost’ or distance between the network’s output and an expected one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name examples of the loss function

A
  • MSE or L2
  • MAE
  • cross-Entropy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does back propagation do?

A
  • It back-propagates the prediction error to update the parameters.
  • The goal is to know how each parameter contributes to the errors. Thus the derivative of the error with respect to each parameter is needed
  • The chain rule tells us how to find the derivative of a composite function, in this case: the loss function with respect to the parameters in a specific layer of the network.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name examples of the activation function

A
  • linear
  • sigmoid
  • hyperbolic tangent
  • ReLU
  • softmax
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s the formula for the linear activation function?

A

sigma(z) = z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s the formula for the sigmoid activation function?

A

sigma(z) = 1 / (1+e^-z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the formula for the tanh activation function?

A

tanh(0) = (e^2o -1) / (e^2o +1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What’s the formula for the ReLU activation function?

A

ReLU(z) = max(z,0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What kind of activation function should you use for regression and why?

A

a linear activation function, since the output should be available to produce all possible values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What kind of activation function should you use for binary classification and why?

A

Sigmoid or tanh, since the output should produce two different values (both sigmoid and tanh produce outputs between 0 and 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What kind of activation function should you use for multiclass classification and why?

A

Softmax, the activation should produce a one-hot encoding indicating the correct class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Using sigmoids in the hidden units what happens to the derivatives?
And what is the solution?

A

The derivative of the sigmoid makes the gradients be small as the activations get closer to 0 or 1. Due to the chain rule the gradients are smaller layer after layer (vanishing gradient problem)
Solution: an activation function with a constant or linear derivative (ReLU, rectified Linear Unit)
Relu is non-linear but has a simple derivative that voids vanishing gradients

17
Q

What is dying ReLU?

A

During training a ReLU unit can fall into a state where its output will be 0 for any input. It is very difficult to recover from such a stat as the gradient will also be 0.

18
Q

What is the leaky-ReLU

A

A variant of ReLU: for negative inputs, a small slope (<1) is used.