3 - Neural Net for modern AI Flashcards
Action Potential
Voltage change spikes on a neuron membrane.
Natural brains do this but current AIs do not.
Perceptron
inputs are combined with weights to create an output of 1 or 0
x0 is a input called bias and is set to 1
Perceptron Learning rule
Compute the value of function and then compare with the real value and modify the weights
new w is old one plus learning rate multiplied by the difference of the real value and network output and multiply by input
newW = oldW + alpha * (real-out) * input
Issue with one perceptron
One perceptron cannot solve the XOR problem
(Imagine you have . and | you cannot split them with one straight line)
. |
| .
Limitations of the perceptron (other than the XOR issue)
Multi layer perceptrons can solve more complex problems
but
there is no method to train the weights w of stacked perceptrons.
How can you train a network with multiple layers (perceptron)
Avoid the step function and use a function that has a computable derivative.
This is so that error can be propagated back through the layers
Sigmoid etc
Why use the sigmoid function?
You can compute the derivative and use this to backpropagate
SImple linear classifier example
f(x,W) = Wx+b
Matrix multiply W and x then add the matrix b
Image = 32* 32 * 3
W = 3072 * 10 (h,w matrix)
x = 3072 * 1 (h,w matrix)
f = 10 * 1 (h,w matrix)
b = 10 * 1 (bias values)
Loss function
Whether the system is doing well. or not.
For example
Li = sum(where j /= yi) max(0,sj-syi+1)
The +1 is a buffer. The true class needs to be at least one better
Limitation of hinge loss
The scores have no meaning, other than a comparison measure. So use probabilities instead
Softmax
S=f(xi,W)
Take the score for each class and divide it by the score for all classes
((e^S)K)/(sum(j) (e^S)j)
Softmax why e^x
All values are positive
sum of all the values is 1 so they are transformed into probabilities
Softmax log is computed as
Li = -log(P(Y= yi | X = xi))
Loss is zero when probability is 1
Cross ENtropy Loss
L = -1/N * (sum(i->1 to N)(sum(k->1 to k)(yik* log(Yik)))
Where N is the num of observations
yik is binary indicator of whether i belongs to class k
Yik is predicted probability that i belongs to k