Topic 8: Intro to Gradient Descent Flashcards

Question 1

Q

What can we say about ε in GD

Answer

A

“target accuracy”
Gradient Descent runtime scales logarithmically with target accuracy (ε)
O (log (1/ε)) number of steps

Question 2

Q

Why is x0 ≠ 0 in GD

Answer

A

As gradient descent works logarithmically:
xt+1 =xt − ηtF ′(xt)

It would mean all xt onwards are multiplied by x0 so would all be 0 and the algorithm would never move
Furthermore if the algorithm starts at F’(x0) = 0
(ie x0 = critical point) then it would never move

Question 3

Q

Where do we want xt to tend

Answer

A

xt -> 0 as t -> inf

Question 4

Q

What is an objective function

Answer

A

The loss function
The function you want to maximise or minimise

Question 5

Q

Describe the GD algorithm steps on a differentiable function F

Answer

A

Initial w1 and T>0 (number of steps)
set ηt > 0
for t=1 to T
wt+1 = wt − ηt ⋅ ∇F(wt)
output wt

Question 6

Q

what is a differentiable function

Answer

A

If it has a well-defined derivative at every point in its domain

Question 7

Q

What properties does a function need to have for gd

Answer

A

Differentiable
convexity and lipschitzness are ideal
existence of at least one minima

Question 8

Q

why does gradient descent converge faster on strongly convex functions compared to non-convex ones

Answer

A

strong convexity:
provides unique global minimum
provides a convergence rate guarantee for gradient descent

Question 9

Q

what is k in gd

Answer

A

k is a constant parameter chosen from the interval (0,1)
It’s a tuning parameter that allows you to control the size of the step length
is not always present in all gd

Question 10

Q

will gd find local minima for non convex non lipshcitz functions

Answer

A

yes
as is often the case irl, functions are not convex and lipschitz
minima is still found through careful consideration of step size and algorithm parameters

Brainscape's Knowledge GenomeTM

Topic 8: Intro to Gradient Descent Flashcards

Brainscape's Knowledge Genome^TM