Loss_Functions_Preference_Levels (MITpaper) Flashcards

Question 1

Q

types of target labels

Answer

A

discrete, ordered labels (current paper)
binary labels (classification)
discrete, unordered (multi-class classification)

Question 2

Q

problem definition

Answer

A

regression problem with discrete, ordered labels
treat it as generalization of binary regression (similar to logistic regression - a case with only 2 ordered labels: positive, negative)

Question 3

Q

solution layout

Answer

A

learn a real valued predictor z(x) i.e. binary linear regression
minimize a loss function on target labels: loss(z(x); y)
define generalizations, threshold-based and probabilistic, for:
logistic loss*
hinge loss*

Question 4

Q

experiment method

Answer

A

use L-2 regularized linear prediction that minimizes the trade-off between overall training loss and L-2 norm of the weights:
J(w) = Sum[ loss(w x_i + w_0; y_i) ] + (lmbd/2) * (L2N(w))^2

Question 5

Q

binary regression - zero-one loss

Answer

A

thresholding the real-valued predictor using sign(z(x)) and y in {0, 1}:
loss(z; y) = 0 if yz > 0 (NO error is made)
loss(z; y) = 1 if yz <= 0
counts the number of errors
not convex, not continuous => hard to minimize
insensitive to magnitude of z, w => regularization ineffective: small w, w_0 have no effect on error (same) while regularization goes to zero

Question 6

Q

binary regression - margin loss

Answer

A

addresse magnitude insensitivity:
loss(z; y) = 0 if yz >= 1 (NO error is made)
loss(z; y) = 1 if yz < 1
for y(w x + w_0) >= 1 (and dividing by |w|) => minimizing both the loss and regulariz term is equivalent to:
maximizing the margin 1 / |w| and minimizing the number of missclasified points
not convex, not continuous => hard to minimize
insensitive to ERROR magnitude and applies the same regularization to all errors

Question 7

Q

binary regression - hinge loss

Answer

A

alternative to margin-loss bc of not convex, not continuous
minimize the loss function, the hinge function:
loss(z; y) = max(0, 1 - yz) =
= 0 if yz >= 1
= 1 - yz if yz < 1
also used as (in SVM): y(w x + w_0) >= 1 - eta w/ eta = margin violation
IMPORTANT: is an upper-bound of zero-one classification error
introduces a linear dependency on the ERROR magnitude (unavoidable for convex loss func)

Question 8

Q

binary regression - smoothed hinge loss

Answer

A

‘smoothed’ losss function easier to minimize (smooth derivative)
loss(z; y) = 0 if yz >= 1 (NO error is made)
loss(z; y) = [(1 - yz)^2] / 2 if 0< yz < 1
loss(z; y) = 0.5 - yz if yz <= 0
introduces a linear dependency on the ERROR magnitude (unavoidable for convex loss func)

Question 9

Q

binary regression - logistic loss

Answer

A

loss(z; y) = log ( 1 + e ^ (-yz) ) = - log P(z | y)
- conditional log-loss likelihood: - log P(z | y) for
logistic conditional likelihood model (estimator):
P(y | z) ~ e ^ yz
- minimizing the Sum(loss(z; y)) ~ maximizing the conditional likelihood model among the models P(y | z)
- with L2 regularization term => MAP estimator with Gaussian prior on w

Question 10

Q

generalization loss function

Answer

A

loss(z; y) is a penalty
- applied to a classification margins yz
- using a specific margin penalty function f(.)
k ordinal levels l_0 = -inf, l_1, …, l_{k-1} l_k = +inf
replacing a single threshold, 0, with k-1 thresholds

Question 11

Q

generalization - immmediate-threshold

Answer

A

for each labeled example (x, y) there is only one correct segment (l_{y-1}, l_y)
penalty: loss(z; y) = f(z - l_{y-1}) + f(l_y -z) for crossing the boundaries of the correct segment
all errors are equally penalized regardless of the ordinal value
f(.) is the margin penalty function

Question 12

Q

generalization - all-threshold

Answer

A

penalize more for ordinal values farther than the true one using:
s(m; y) = -1 if m < y
s(m; y) = +1 if m >= y
loss(z; y) = Sum_{m:1..k-1} f [ s(m; y) (l_y -z) ]
f(.) is the margin penalty function

Loss_Functions_Preference_Levels (MITpaper) Flashcards

(12 cards)