Lecture 4 - Training CNNS Flashcards
Regression
A set of processes for modelling the relationship between an output and input features
Classification
Problem of classifying inputs
How to measure regression performance
Loss Function like Mean Absolute Error or Mean Squared Error,…
Loss Function: Value size when it is well classified vs not
well classified = small value
else big value
Negative Log Likelihood Loss
Lp = -Log(p)
Softmax
Normalise network output tio a probability distribution over predicted output classes
exp(xi) divided by the sum of all j where exp(xj)
Cross entropy loss
-Log(softmax)
Optimisation
Process to find best weight to minimise loss function
Gradient Descent
Gradient direciton and step size. “Walk” towards minima.
Learning Rate
Step size.
Too low = no progress
too high = instability and never converges
Stochastic Gradient Descent
each w = w - a * gradient value
a here is alpha (learning rate)
see slides for full equation
SGD Weight Decay
Used to prevent the weight being too big
w <- w - a(deriv(w)L(x,w)-yw) + p[last update]
a is alpha (learning rate) and y is gamma. yw is weight decay regularisation
SGD Momentum
p[last update]
where p is a number less than 1 (meaning p * update i think?)
“Gradient Value” equation. Using weights, inputs and derivative of..?
sum(i)(deriv(w)*L(xi,w))