GD, SGD, and SVM Flashcards

Question 1

Q

Gradient Descent

Answer

A

General approach for minimizing a differentiable convex function f(w).

Measures the local gradient of the error function with regards to the parameter vector theta, and it goes in the direction of descending gradient.

GD algorithm

w(0) <– 0
for t <– 0 to T-1 do
- w(t+1) <– w(t) - eta grad(f(w(t)))
return barw = 1/T SUM w(t)

Question 2

Q

Stochastic Gradient Descent

Answer

A

instead of using exactly the gradient we take a (random) vector with expected value equal to the gradinet direction.

SGD algorihm

w(0) <– 0
for t <– 0 to T-1 do
- choose vt at random from distribution such that E[vt|w(t)] in grad(f(w(t)))
/// vt has expected value equal to the gradient of f(w(t))
- w(t+1) <– w(t) - etavt
return barw = 1/T SUM w(t)

Question 3

Q

Hard-SVM optimization problem

Answer

A

Learning rule in which we return an ERM hyperplane that separate the training set with the largest possible margin. (only for linearly separable data.

Computational problem:

argmax ( min |<w,xi> + b |)

Question 4

Q

Hard-SVM: equivalent formulation, and quadratic formulation

Answer

A

Due to separability assumption :

argmax ( min yi ( <w,xi> + b ) )

Question 5

Q

Definition of support vectors for Hard-SVM

Answer

A

Support vectors = vectors at minimimum distance from w0
the are the only ones that matter for defining w0

Question 6

Q

Soft-SVM optimization problem, hinge loss

Answer

A

Used to train not linearly separable. It allows to have some points inside the margin or wrongly classified.

yi(<w,xi> +b) >=1

Soft-SVM constraints:
- slack variables: eta
- for each i : yi(<w,xi> +b) >=1 - eta
- eta how much constraint yi(<w,xi> +b) >=1 is violated

Soft-SVM minimizes combinations of
- norm of w (margin)
- average of etai (violation of the constraints)

Soft-SVM optimization problem

input (x1,y1)…(xm,ym), parameter lambda > 0
solve:

min(lambda||w||^2 +1/m SUM etai)

subject to for all i yi(<w,xi> +b) >=1 - eta and etai >= 0
output: w,b

Hinge Loss: it is the equivalent formulation

Ls(hinge) = 1/m SUM lhinge ((w,b),(xi,yi))

Hinge formulation
min(lambda ||w||^2 + Lshinge(w,b))

Question 7

Q

SGD for solving soft-SVM

Answer

A

We want to solve

min(lambda/2||w||^2 + 1/m SUM max {0,1 - y<w,xi>})

Note: it’s the standard to add a 1/2 in the regularization term to semplify some computation

Question 8

Q

Hard-SVM dual formulation

GD, SGD, and SVM Flashcards

(8 cards)