Chapter 6: Optimisation Flashcards
we want to optimise….
the loss function
what is the optimisation theory
each derivative of the objective function with respect to each input variable should be 0
delta O(w) =
delta O(w) ----------------- for each w delta w
How do we optimise the function O(w) to give us w, give the steps
derivative of O(w)
set to 0
rearrange to find w
give the normal equation for MAIP (Y)
Y = W^T Xtilde
give the least squares equation for O(w)
1/2 sum(Y-y)^2
give delta O(w) for least squares
Xtilde+ Y
give the weights in the normal equation for least squares
W = Xtilde+Y
Give the l2 regularised least squares model equation
W = ( Xt^TXt + lamda * I ) ^-1 * Xt^T * Y
what do we add to regularise the normal equation for the least squares model
lamda * Identity
what is gradient descent
apply a change to minimise the gradient
what do we use to optimise non linear models
gradient descent
how is ‘change’ defined
O(t+1) = O(t) + change(Ot)
change(o) = - learningrate delta O(o)
what is the learning rate
determines how much we change in relative to gradient
determines how many iterations before we reach 0
what happens if the learning rate is too low or too high
it will take an unnecessary number of iterations