lecture 8: optimisation and gradient descent Flashcards

Question 1

Q

in general, an objective function can be described by

Answer

A

cost function = loss function(learning model) + regularisation
this as a whole, plus the optimisation routine used to solve, makes up the building blocks of ML algorithms

Question 2

Q

what is gradient descent

Answer

A

moving along the model in the direction where gradient is decreasing according to some learning rate eta(𝜂) until some convergence criteria is reached, to find an estimate of the local minima

Question 3

Q

what are some possible convergence criteria

Answer

A

set maximum iteration k
check percentage/absolute change in C below a threshold
check percentage/absolute change in w below a threshold

Question 4

Q

why can gradient descent only find the local minima?

Answer

A

because gradient = 0 at local minima, so w cannot change after that

Question 5

Q

what does increasing the learning rate do

Answer

A

it increases the rate at which w converges, too high of an eta can cause the function to repeatedly overshoot the local minima

Question 6

Q

what is the purpose of loss functions

Answer

A

different loss functions encode the penalty for prediction when the true value is yᵢ

Question 7

Q

what is binary loss function

Answer

A

the product of f(x,w) and yᵢ
if the sign is positive, classification is correct and and loss is 0, if sign is negative then classification is wrong and loss is 1

Question 8

Q

binary loss is not differentiable, what are 2 other possible loss functions to use?

Answer

A

hinge loss and exponential loss
hinge loss = max(0, 1 - prediction)
exponential loss = exp(-prediction)

Question 9

Q

what does the sigmoid function do

Answer

A

used for classification, maps pᵢᵀw(which is between ∞ and -∞) to between 1 and -1
f(x,w) = σ(pᵢᵀw)
σ(a) = 1/(1 + e⁻ª)

lecture 8: optimisation and gradient descent Flashcards

(9 cards)