College 2 Flashcards

Question 1

Q

In the AND problem what is the value of Y?
x1 = 0, x2 = 0
x1 = 1,  x2 = 0
x1 = 0, x2 = 1
x1 = 1,  x2 = 1

Answer

A

y = 0
y = 0
y = 0
y = 1

Question 2

Q

In the XOR problem what is the value of Y?
x1 = 0, x2 = 0
x1 = 1,  x2 = 0
x1 = 0, x2 = 1
x1 = 1,  x2 = 1

Answer

A

y = 0
y = 1
y = 1
y = 0

Question 3

Q

Can the perceptron solve the xor problem?

Answer

A

Two perceptrons combined can.

Question 4

Q

Define: dense / fully-connected layer

Answer

A

A linear operation in which every input is connected to every output by a weight

Question 5

Q

Define: loss function

Answer

A

Also called Error or Cost. Calculates the ‘cost’ or distance between the network’s output and an expected one.

Question 6

Q

Name examples of the loss function

Answer

A

MSE or L2
MAE
cross-Entropy

Question 7

Q

What does back propagation do?

Answer

A

It back-propagates the prediction error to update the parameters.
The goal is to know how each parameter contributes to the errors. Thus the derivative of the error with respect to each parameter is needed
The chain rule tells us how to find the derivative of a composite function, in this case: the loss function with respect to the parameters in a specific layer of the network.

Question 8

Q

Name examples of the activation function

Answer

A

linear
sigmoid
hyperbolic tangent
ReLU
softmax

Question 9

Q

What’s the formula for the linear activation function?

Answer

A

sigma(z) = z

Question 10

Q

What’s the formula for the sigmoid activation function?

Answer

A

sigma(z) = 1 / (1+e^-z)

Question 11

Q

What’s the formula for the tanh activation function?

Answer

A

tanh(0) = (e^2o -1) / (e^2o +1)

Question 12

Q

What’s the formula for the ReLU activation function?

Answer

A

ReLU(z) = max(z,0)

Question 13

Q

What kind of activation function should you use for regression and why?

Answer

A

a linear activation function, since the output should be available to produce all possible values.

Question 14

Q

What kind of activation function should you use for binary classification and why?

Answer

A

Sigmoid or tanh, since the output should produce two different values (both sigmoid and tanh produce outputs between 0 and 1)

Question 15

Q

What kind of activation function should you use for multiclass classification and why?

Answer

A

Softmax, the activation should produce a one-hot encoding indicating the correct class.

Question 16

Q

Using sigmoids in the hidden units what happens to the derivatives?
And what is the solution?

Answer

A

The derivative of the sigmoid makes the gradients be small as the activations get closer to 0 or 1. Due to the chain rule the gradients are smaller layer after layer (vanishing gradient problem)
Solution: an activation function with a constant or linear derivative (ReLU, rectified Linear Unit)
Relu is non-linear but has a simple derivative that voids vanishing gradients

Question 17

Q

What is dying ReLU?

Answer

A

During training a ReLU unit can fall into a state where its output will be 0 for any input. It is very difficult to recover from such a stat as the gradient will also be 0.

Question 18

Q

What is the leaky-ReLU

Answer

A

A variant of ReLU: for negative inputs, a small slope (<1) is used.