2 Linear Classifiers Flashcards
What is one-hot encoding?
In classification we use this instead of a salar y.
K-dim vector per disered putput
Vector of 0 with one 1 corresponding to the class the indx represents
Ex: three class problem
[1,0,0] [0,1,0][0,0,1]
WHat is a perceptron?
We apply a non-linear activation of top of the linear transfrorm:
f(x) = g(w^Tx)
What does the perceptron define?
A step function
g(a) = { 1 if a >= 0
{ -1 otherwise
What does the linear least squares classifier do?
It seperates two or more classes by finding a hyperplane that maximizes the margin between the classes.
Simple case: defines a line that seperates the two classes.
What is the solution to LS classifier?
W = x^t y (t is pseudoinverse)
We “learn” the predictor by solveing for W, which are the weights
Differentiate Loss ( L ) with reference to w and solve loss by setting to zero
How does W look in least squares classifier?
It becomes a matrix
A value for every possible outcome, for every time we do it.
How do you calculate loss in least squares classifier?
We need to solve for a matrix of coefficients ( one model per column)
L(f(x),y) = 1/2 * sum(m,i=1) (y i - w^T xi)^2
= 1/2||y - X w||^2
Optimizing, we want to minimize loss
How do you determine class density in least squares classifier?
each is approximated by its own regression model:
p(Ck|x) ~~ fk(x) = wk^T x
What is the perceptron criterion?
The loss in perceptron:
L(f(x),y) = - sum(m,i=1) w^T xi yi
How is learning done in perceptrons
By stockastic gradient descent
What is stockastic
Select training examples one by one in random order
What is gradient descent?
Use negatve of the gradient to update the weights
w <- w - DeltaL
w <- w + xi yi
How do you calculate the gradiant of the loss
Delta L = -xi yi
What is logistic regression?
Used for binary classification tasks.
Uses the sigmoid funtion to model the probability of an event occuring.
p(C1|x) = sigmoid(w^T x)
p(C2|x) = 1 - p(C1|x)
What is the sigmoid funcion
Used for logistic regression
sigmoid (a) = 1/(1+e^-a)
What is the output of logistic regression
Either 0 or 1, which defines a bernoulli trail
How do you find w in logistic regression
Through maximum likelihood
What is the log likelyhood for linear regression?
Most important
As usual in ML estimation, its easier to take the logarithm
log Fnzy L(w) = sum(m,i=1) [yi log sigmoid(ai) + (1-yi) log(1- sigma(ai))]
How does the Bernoulli trail look?
p(C1 | x)^y (1-p(C1|x))^1-y
What is the loss function for linear regression?
(MSE) (logistic regression)
L(f(x),y) = 1/2 sum(m,i=1) (yi - w^T xi)^2
What is the equation for linear regression?
y = f(x) + epsilon = w^T x + epsilon
~N(0,sigmoid^2)
where
p(y|x) Ñ(f(x), sigmoid^2)
What is the gradiant of the log likelihood? (Logistic regression)
Delta log fnzy L = sum(m,i=1)(yi - sigmoid(w^T xi)) xi
K-NN vs LLS classifier
K-NN:
- Best when data is high dim
- More robust with small number of training examples
- Better choise when decision boundry is non-linear
- more computational expensive, since it needs comparinginput to each training example
- Generalize better with small dataset
LLS:
- best when data is low dim
- requires larger number of training examples
- Better choice when decision boundry is linear
- Closed form low comlpexity
- Generalize better with large dataset
How do we do multiclass logistic regression?
- K classes instead of 2
- K weight vectors
- for each class we model the density by the soft max function:
p(C | x) = exp(ak)/(sum(k,i=1)exp(ai))
where
ak = wk^T x
What is the log-likelihood for MLR
log fnzy L(w1, …, wk) = sum(m,i=1) sum(K,k=1) yi,k log yhati,k
We use one shot encoding
Our estimated output Yhat follows a discrete distribution
What is the gradient with reference to w for MLR
Delta wk Fnzy L(w1, …, wk) = sum(m,i=1) (yi,k-yhati,k)xi
What does linear least squares do?
Predict the scalar response y from an input vector x.
Fit a and b in y =ax+b, to make the line fit the points.
What is classification?
From the input data, output the label/category which the input belongs to.
What are the classification problems?
Predict K categopries (2 or more)
what is regression?
Predict a continuous output from the input.
What is least squares classifier?
- change y from a continous value to a discrete value
- instead of scalar y, we use one-hot encoding.
Is KNN linear?
The overall strugture is not linear, but it has n
linear parts
What are options for encoding?
Integer labels
One-hot encoding