Loss_Functions_Preference_Levels (MITpaper) Flashcards

1
Q

types of target labels

A
  • discrete, ordered labels (current paper)
  • binary labels (classification)
  • discrete, unordered (multi-class classification)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

problem definition

A
  • regression problem with discrete, ordered labels
  • treat it as generalization of binary regression (similar to logistic regression - a case with only 2 ordered labels: positive, negative)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

solution layout

A
  • learn a real valued predictor z(x) i.e. binary linear regression
  • minimize a loss function on target labels: loss(z(x); y)
  • define generalizations, threshold-based and probabilistic, for:
  • logistic loss*
  • hinge loss*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

experiment method

A
  • use L-2 regularized linear prediction that minimizes the trade-off between overall training loss and L-2 norm of the weights:
    J(w) = Sum[ loss(w x_i + w_0; y_i) ] + (lmbd/2) * (L2N(w))^2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

binary regression - zero-one loss

A
  • thresholding the real-valued predictor using sign(z(x)) and y in {0, 1}:
    loss(z; y) = 0 if yz > 0 (NO error is made)
    loss(z; y) = 1 if yz <= 0
  • counts the number of errors
  • not convex, not continuous => hard to minimize
  • insensitive to magnitude of z, w => regularization ineffective: small w, w_0 have no effect on error (same) while regularization goes to zero
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

binary regression - margin loss

A
  • addresse magnitude insensitivity:
    loss(z; y) = 0 if yz >= 1 (NO error is made)
    loss(z; y) = 1 if yz < 1
  • for y(w x + w_0) >= 1 (and dividing by |w|) => minimizing both the loss and regulariz term is equivalent to:
    maximizing the margin 1 / |w| and minimizing the number of missclasified points
  • not convex, not continuous => hard to minimize
  • insensitive to ERROR magnitude and applies the same regularization to all errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

binary regression - hinge loss

A
  • alternative to margin-loss bc of not convex, not continuous
  • minimize the loss function, the hinge function:
    loss(z; y) = max(0, 1 - yz) =
    = 0 if yz >= 1
    = 1 - yz if yz < 1
  • also used as (in SVM): y(w x + w_0) >= 1 - eta w/ eta = margin violation
  • IMPORTANT: is an upper-bound of zero-one classification error
  • introduces a linear dependency on the ERROR magnitude (unavoidable for convex loss func)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

binary regression - smoothed hinge loss

A
  • ‘smoothed’ losss function easier to minimize (smooth derivative)
    loss(z; y) = 0 if yz >= 1 (NO error is made)
    loss(z; y) = [(1 - yz)^2] / 2 if 0< yz < 1
    loss(z; y) = 0.5 - yz if yz <= 0
  • introduces a linear dependency on the ERROR magnitude (unavoidable for convex loss func)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

binary regression - logistic loss

A

loss(z; y) = log ( 1 + e ^ (-yz) ) = - log P(z | y)
- conditional log-loss likelihood: - log P(z | y) for
logistic conditional likelihood model (estimator):
P(y | z) ~ e ^ yz
- minimizing the Sum(loss(z; y)) ~ maximizing the conditional likelihood model among the models P(y | z)
- with L2 regularization term => MAP estimator with Gaussian prior on w

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

generalization loss function

A
  • loss(z; y) is a penalty
    • applied to a classification margins yz
    • using a specific margin penalty function f(.)
  • k ordinal levels l_0 = -inf, l_1, …, l_{k-1} l_k = +inf
  • replacing a single threshold, 0, with k-1 thresholds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

generalization - immmediate-threshold

A
  • for each labeled example (x, y) there is only one correct segment (l_{y-1}, l_y)
  • penalty: loss(z; y) = f(z - l_{y-1}) + f(l_y -z) for crossing the boundaries of the correct segment
  • all errors are equally penalized regardless of the ordinal value
  • f(.) is the margin penalty function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

generalization - all-threshold

A
  • penalize more for ordinal values farther than the true one using:
    s(m; y) = -1 if m < y
    s(m; y) = +1 if m >= y
    loss(z; y) = Sum_{m:1..k-1} f [ s(m; y) (l_y -z) ]
  • f(.) is the margin penalty function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly