2 Linear Classifiers Flashcards
What is one-hot encoding?
In classification we use this instead of a salar y.
K-dim vector per disered putput
Vector of 0 with one 1 corresponding to the class the indx represents
Ex: three class problem
[1,0,0] [0,1,0][0,0,1]
WHat is a perceptron?
We apply a non-linear activation of top of the linear transfrorm:
f(x) = g(w^Tx)
What does the perceptron define?
A step function
g(a) = { 1 if a >= 0
{ -1 otherwise
What does the linear least squares classifier do?
It seperates two or more classes by finding a hyperplane that maximizes the margin between the classes.
Simple case: defines a line that seperates the two classes.
What is the solution to LS classifier?
W = x^t y (t is pseudoinverse)
We “learn” the predictor by solveing for W, which are the weights
Differentiate Loss ( L ) with reference to w and solve loss by setting to zero
How does W look in least squares classifier?
It becomes a matrix
A value for every possible outcome, for every time we do it.
How do you calculate loss in least squares classifier?
We need to solve for a matrix of coefficients ( one model per column)
L(f(x),y) = 1/2 * sum(m,i=1) (y i - w^T xi)^2
= 1/2||y - X w||^2
Optimizing, we want to minimize loss
How do you determine class density in least squares classifier?
each is approximated by its own regression model:
p(Ck|x) ~~ fk(x) = wk^T x
What is the perceptron criterion?
The loss in perceptron:
L(f(x),y) = - sum(m,i=1) w^T xi yi
How is learning done in perceptrons
By stockastic gradient descent
What is stockastic
Select training examples one by one in random order
What is gradient descent?
Use negatve of the gradient to update the weights
w <- w - DeltaL
w <- w + xi yi
How do you calculate the gradiant of the loss
Delta L = -xi yi
What is logistic regression?
Used for binary classification tasks.
Uses the sigmoid funtion to model the probability of an event occuring.
p(C1|x) = sigmoid(w^T x)
p(C2|x) = 1 - p(C1|x)
What is the sigmoid funcion
Used for logistic regression
sigmoid (a) = 1/(1+e^-a)