Linear Classifier Flashcards

1
Q

What is linear classifier?

A

Classifier that works based on linear combination of features to determine the object classification. Mathematically, it is expressed as f(x,W) = Wx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of bias ?

A

Bias given an offset to each of the output. f(x,W) = Wx + b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the bias trick?

A

Add extra one to the data vector; bias is absorbed into the last column of the weigh matrix. If bias is [1.1, 3,2, -1.2], it will be added as a last column in the weight matrix. For the data vector, [1] will be added as the the last row of the data vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 3 different viewpoints of linear classifier?

A

Algebraic, geometric, template

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is loss function?

A

It is used to quantify hw good a value of W is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is optimization?

A

It is to find a W that minimizes loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is multi-class SVM loss?

A

The score of the correct class should be higher than all the store scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is hinge loss?

A

Hinge loss is a loss function used in SVM in which the loss function is defined as max(0, 1 - sj - sy) where sj is the score of the predictor and sy is the actual score. For example, sy = 1 and sj =0.5, hinger function will be 0.5. 1 is called the margin. As it is a linear classifier, the further you are from the correct class, the loss will increase linearly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What if two functions have the same loss (data loss) ? What can be done in this case?

A

If such a case is to happen, we can have regularization term to prevent the model from overfitting on the training set. The final loss function will consist of data loss (model predictions should match training data) and regularization term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are examples of regularization?

A

L1 regularization - absolute difference
L2 regularization - Euclidean distance
Elastic net - L1 + L2

More complex regularization will be examples like dropout, batch normalization, cutout, mixup and stochastic depth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the purposes of regularization?

A

Express preferences in among models beyond “minimize training error”
Avoid overfitting since simple models generalizes better
Improve optimization by adding curvature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What will happen if you choose L2 regularization?

A

L2 regularization likes to “spread out” the weight. Maybe useful if the features are noisy. Better to spread out the weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What will happen if you choose L1 regularization?

A

L1 regularization tends to concentrate the weight on a fewer features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do you use cross-entropy loss?

A

Cross-entropy loss is when you want to interpret raw classifier scores as probabilities?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does you do cross-entropy loss?

A

It takes the raw score and goes through the following transformation.

  1. e^score
  2. normalize it such that the probabilities sums to 1

(1) and (2) is known as the softmax function.
3. Taking log of the probabilities -log(p) based on maximum likelihood estimation (choose weights to maximize the likelihood of data)

4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What would be the loss function is all scores are random values uniformly distributed?

A

In this case, it will be uniformly distributed. The value will be -log(1/C) = log C