we are so close let's go Flashcards

1
Q

norm stuff:

A

||.|| means ||x||>=0 and ||x||=0 iff x=0
||λx|| = |λ|||x||
||x+y||<=||x||+||y||
||x||1 -> sum of |xi|s
||x||2 -> root of sum of squares of xi (square is in the sum, root is outside)
infinity one -> max of |xi|s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

gradient descent:

A

x(k+1)=xk+akpk, pk=-f’(xk)
any norms in here refer to 2-norms (||x||2)
if it asks for improvement, stochastic gradient descent cause you don’t calc every gradient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

descent direction:

A

smth (p?) is a descent direction if ⟨pk,f’(xk)⟩<0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

optimal step a for descent algo:

A

whatever minimises f(xk+apk)
we don’t often use this we find some sufficient decrease condition instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

armijo goldstein conditions:

A

f(xk+apk)<=f(xk)+ca⟨f’(xk),pk⟩, c in (0,1)
f(xk+apk)>=f(xk)+(1-c)ak⟨f’(xk),pk⟩

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

convergence:

A

linear -> ||x(k+1)-x||<=r||xk-x||
superlinear -> as k approaches infinity, ||x(k+1)-x||/||xk-x||=0
order p -> ||x(k+1)-x||<=M||xk-x||^p, when p=2 it’s quadratic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

newton’s method:

A

x(k+1)=xk-f’‘(xk)^(-1)f’(xk)
can converge to max or saddle points as well as min, to check we want f’‘(x*)>0
downside, computing hessians Expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

supervised learning:

A

goal is to learn a function h:X->Y, where X is an input and Y is an output (and both are written fancy), given a set of input-output pairs (xi,yi)
pose it as minimise (1/N)sum(1->N)(l(h(xi),yi)) given a class of functions F (cause h in F) and a loss function l:YxY->R+
some uses, handwriting recognition, linear regression, text classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

set is convex when:

A

λx+(1-λ)y in C
if C, D convex, C intersection D and C+D are convex, and AC is convex where A is some real matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

function is convex when:

A

f(λx+(1-λ)y)<=λf(x)+(1-λ)f(y)
f’‘(x) is positive semidefinite (v^(T)f’‘(x)v>=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

linear programming format:

A

maximise ⟨c,x⟩ subject to Ax<=b
can replace with subject to Ax=b, x>=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

feasible set:

A

set of all possible candidates for a lin prog or convex opt problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

farkas’ lemma:

A

there exists x such that Ax=b, x>=0 iff there does Not exist y such that A^(T)y>=0, ⟨y,b⟩<0
also works for Ax<=b w/ A^(T)y=0, y>=0, and ⟨y,b⟩<0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

linear program dual:

A

primal is max ⟨c,x⟩ subject to Ax<=b
dual is min ⟨b,y⟩ subject to A^(T)y=c, y>=0
the optimal value (so the inner product things) are the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

lagrange dual:

A

φ(λ,μ)=inf(L(x,λ,μ)) (L thing is f(x)+λg(x)+μh(x), where f is the thing being min/maxed, g is the <=0 thing, and h is the =0 thing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

strong duality:

A

the primal and dual optimal values are the same

17
Q

KKT conditions:

A

g(x)<=0
h(x
)=0
λ>=0
λi
gi(x)=0
f’(x
)+sum(λigi’(x))+sum(μihi’(x))=0
if these are satisfied there’s 0 duality gap (so strong duality)

18
Q

central path:

A

the set of sols when the optimality conditions switch to XSe=τe and x,s>0 (instead of =0 and >=0 respectively), as τ->0 any sol of this approaches the regular solution
a set of solutions x(t) for t>0 that solve minimise tf(x)+Φ(x) subject to Ax=b
a point x
is equal to a point x(t) on the central path iff there exist dual multipliers λ, μ* such that -
g(x)<=0
Ax
=b
λ>=0
-λi
gi(x)=1/t
f’(x
)+sum(λigi’(x))+A^(T)μ*=0

19
Q

optimality conditions:

A

using min ⟨c,x⟩ subject to Ax=b, x>=0 as example, conditions are -
Ax-b=0
A^(T)y+s-c=0 (s is a slack variable cause normally this’d be a <= not a =)
XSe=0
x>=0
s>=0
everything with an s involved is done for s1,…,sn

20
Q

derivatives by vectors:

A

x^(T)B -> B
x^(T)b -> b - same as b^(T)x
x^(T)x -> 2x
x^(T)Bx -> 2Bx
basically drop the x^(T) and if there’s also an x, replace it with a 2