Linear Classifier Flashcards
What is linear classifier?
Classifier that works based on linear combination of features to determine the object classification. Mathematically, it is expressed as f(x,W) = Wx
What is the purpose of bias ?
Bias given an offset to each of the output. f(x,W) = Wx + b
What is the bias trick?
Add extra one to the data vector; bias is absorbed into the last column of the weigh matrix. If bias is [1.1, 3,2, -1.2], it will be added as a last column in the weight matrix. For the data vector, [1] will be added as the the last row of the data vector
What are the 3 different viewpoints of linear classifier?
Algebraic, geometric, template
What is loss function?
It is used to quantify hw good a value of W is
What is optimization?
It is to find a W that minimizes loss
What is multi-class SVM loss?
The score of the correct class should be higher than all the store scores
What is hinge loss?
Hinge loss is a loss function used in SVM in which the loss function is defined as max(0, 1 - sj - sy) where sj is the score of the predictor and sy is the actual score. For example, sy = 1 and sj =0.5, hinger function will be 0.5. 1 is called the margin. As it is a linear classifier, the further you are from the correct class, the loss will increase linearly.
What if two functions have the same loss (data loss) ? What can be done in this case?
If such a case is to happen, we can have regularization term to prevent the model from overfitting on the training set. The final loss function will consist of data loss (model predictions should match training data) and regularization term.
What are examples of regularization?
L1 regularization - absolute difference
L2 regularization - Euclidean distance
Elastic net - L1 + L2
More complex regularization will be examples like dropout, batch normalization, cutout, mixup and stochastic depth
What are the purposes of regularization?
Express preferences in among models beyond “minimize training error”
Avoid overfitting since simple models generalizes better
Improve optimization by adding curvature.
What will happen if you choose L2 regularization?
L2 regularization likes to “spread out” the weight. Maybe useful if the features are noisy. Better to spread out the weight
What will happen if you choose L1 regularization?
L1 regularization tends to concentrate the weight on a fewer features
What do you use cross-entropy loss?
Cross-entropy loss is when you want to interpret raw classifier scores as probabilities?
How does you do cross-entropy loss?
It takes the raw score and goes through the following transformation.
- e^score
- normalize it such that the probabilities sums to 1
(1) and (2) is known as the softmax function.
3. Taking log of the probabilities -log(p) based on maximum likelihood estimation (choose weights to maximize the likelihood of data)
4