gradient descent and iterative stuff Flashcards
iterative method (basic):
x(k+1)=xk+akpk, starting with an educated guess x0, f(x(k+1))<=f(xk), tryna get it to converge to x*
step length:
ak in the iterative method, pretty sure it’s usually a constant
search direction:
pk in the iterative method, gradient (derivative) of a function
pk=-∇f(xk)
how to choose step length optimal:
a minimiser of the function a->φ(a):=f(xk+apk) (so φ’(a)=0 and rearrange for a)
sufficient decrease condition:
f(xk+apk)<=f(xk)+ca⟨∇f(xk),pk⟩=:l(a) with c in (0,1)
aka armijo condition, always used only sometimes with others
wolfe condition:
φ’(a)>=dφ’(0), d in (c,1)
armijo-goldstein condition:
ak should satisfy f(xk)+(1-c)ak⟨∇f(xk),pk⟩<=f(xk+akpk)
backtracking:
pick a large a (like 1) and decrease it until it satisfies the required conditions
vector convergence:
for all ε>0 there exists some N such that for all n>=N, ||xn-x*||<ε (for a sequence of vectors {x1,…,xn})
linear convergence:
assume a sequence of vectors {xk}, k>=0, converges to x* - it’s linear if there exists an r in (0,1) such that for sufficiently large k, ||x(k+1)-x||<=r||xk-x|| - r is called the rate
superlinear convergence:
assume a sequence of vectors {xk}, k>=0, converges to x* - it’s superlinear if lim(k->∞)||x(k+1)-x||/||xk-x||=0
convergence with order p:
assume a sequence of vectors {xk}, k>=0, converges to x* - it converges with order p>1 if there is a constant M>0 such that for sufficiently large k, ||x(k+1)-x||<=M||xk-x||^p - p=2 is called quadratic convergence