Logistic regression Flashcards

Question 1

Q

Data-set

Answer

A

{(x1,y1),…,(xN,yN)}
generated independently according to
P(y|x) = f(x) if y=+1; 1-f(x) if y=-1

Question 2

Q

objective

Answer

A

To learn the target function
f(x) = P[y=1 | x]

Note that f(x) is not the output y.

Question 3

Q

def logistic function

Answer

A

θ(s) = e^s / (1 + e^s)

Question 4

Q

Hypothesis

Answer

A

h(s) = θ(s)
where s is the risk score, modelled as a linear function of the input x
s = w’ x
- > h(x) = θ(w’ x)

Question 5

Q

Performance index

Answer

A

Maximum likelihood inference:
maximization wrt w of Π(n=1,N) Pw(yn|xn)
where Pw(yn|xn) = θ(yn w' xn)

Reformulation as a minimization problem:
minimize wrt w -1/N * sum(n=1,N) ln( Pw(yn|xn) )

In general, this is a non-linear minimization problem

Question 6

Q

In sample error in logistic regression

Answer

A

Ein(w) = 1/N * sum(n=1,N) ln(1+e^-yn w’ xn)

Question 7

Q

Gradient descent algorithm

Answer

A

initialize the weights at t=0 as w^0
for t=1,2,… do
- compute the gradient of the in-sample error using all the data
g(t) = ∇Ein(w^(t))
- set the direction v^t = -g(t)
- update the weights as
w^(t+1) = w^(t) + η*v^t
- iterate until stop
return the weights that minimize Ein(w)

Question 8

Q

Stopping criteria

Answer

A

1) upper bound on number of iterations
2) threshold on the norm of the gradient ||g(t)||
3) combination of 1 and 2
4) combination of the previous plus a threshold on Ein

Question 9

Q

Stochastic gradient descent

Answer

A

Gradient descent algorithm uses all the N data to compute ∇Ein

Stochastic GD algorithm, instead:
- pick a data point (xn,yn) at random
ein(w) = ln( 1+e^-yn w' xn )
∇ein(w) = …
- update rule: w^(t+1) = w^(t) - η*∇ein(w)

> the minimization proceeds on average in the right direction, with some fluctuations, but with the advantage of requiring many less computations

Question 10

Q

Selection of the parameter η in the gradient descent algorithm

Answer

A

η is a design parameter:
- it should be small to allow Taylor approximation
- it should be large enough to obtain good convergence performances
- a good choice is a variable η:
η^(t) = η*||∇Ein(w(t))

Logistic regression Flashcards

(10 cards)