College 2 Flashcards
In the AND problem what is the value of Y? x1 = 0, x2 = 0 x1 = 1, x2 = 0 x1 = 0, x2 = 1 x1 = 1, x2 = 1
y = 0 y = 0 y = 0 y = 1
In the XOR problem what is the value of Y? x1 = 0, x2 = 0 x1 = 1, x2 = 0 x1 = 0, x2 = 1 x1 = 1, x2 = 1
y = 0 y = 1 y = 1 y = 0
Can the perceptron solve the xor problem?
Two perceptrons combined can.
Define: dense / fully-connected layer
A linear operation in which every input is connected to every output by a weight
Define: loss function
Also called Error or Cost. Calculates the ‘cost’ or distance between the network’s output and an expected one.
Name examples of the loss function
- MSE or L2
- MAE
- cross-Entropy
What does back propagation do?
- It back-propagates the prediction error to update the parameters.
- The goal is to know how each parameter contributes to the errors. Thus the derivative of the error with respect to each parameter is needed
- The chain rule tells us how to find the derivative of a composite function, in this case: the loss function with respect to the parameters in a specific layer of the network.
Name examples of the activation function
- linear
- sigmoid
- hyperbolic tangent
- ReLU
- softmax
What’s the formula for the linear activation function?
sigma(z) = z
What’s the formula for the sigmoid activation function?
sigma(z) = 1 / (1+e^-z)
What’s the formula for the tanh activation function?
tanh(0) = (e^2o -1) / (e^2o +1)
What’s the formula for the ReLU activation function?
ReLU(z) = max(z,0)
What kind of activation function should you use for regression and why?
a linear activation function, since the output should be available to produce all possible values.
What kind of activation function should you use for binary classification and why?
Sigmoid or tanh, since the output should produce two different values (both sigmoid and tanh produce outputs between 0 and 1)
What kind of activation function should you use for multiclass classification and why?
Softmax, the activation should produce a one-hot encoding indicating the correct class.