2 Linear Classifiers Flashcards by N Ped

What is one-hot encoding?

In classification we use this instead of a salar y.

K-dim vector per disered putput
Vector of 0 with one 1 corresponding to the class the indx represents

Ex: three class problem
[1,0,0] [0,1,0][0,0,1]

How well did you know this?

Not at all

Perfectly

WHat is a perceptron?

We apply a non-linear activation of top of the linear transfrorm:
f(x) = g(w^Tx)

How well did you know this?

Not at all

Perfectly

What does the perceptron define?

A step function

g(a) = { 1 if a >= 0
{ -1 otherwise

How well did you know this?

Not at all

Perfectly

What does the linear least squares classifier do?

It seperates two or more classes by finding a hyperplane that maximizes the margin between the classes.

Simple case: defines a line that seperates the two classes.

How well did you know this?

Not at all

Perfectly

What is the solution to LS classifier?

W = x^t y (t is pseudoinverse)

We “learn” the predictor by solveing for W, which are the weights

Differentiate Loss ( L ) with reference to w and solve loss by setting to zero

How well did you know this?

Not at all

Perfectly

How does W look in least squares classifier?

It becomes a matrix
A value for every possible outcome, for every time we do it.

How well did you know this?

Not at all

Perfectly

How do you calculate loss in least squares classifier?

We need to solve for a matrix of coefficients ( one model per column)

L(f(x),y) = 1/2 * sum(m,i=1) (y i - w^T xi)^2
= 1/2||y - X w||^2

Optimizing, we want to minimize loss

How well did you know this?

Not at all

Perfectly

How do you determine class density in least squares classifier?

each is approximated by its own regression model:

p(Ck|x) ~~ fk(x) = wk^T x

How well did you know this?

Not at all

Perfectly

What is the perceptron criterion?

The loss in perceptron:
L(f(x),y) = - sum(m,i=1) w^T xi yi

How well did you know this?

Not at all

Perfectly

How is learning done in perceptrons

By stockastic gradient descent

How well did you know this?

Not at all

Perfectly

What is stockastic

Select training examples one by one in random order

How well did you know this?

Not at all

Perfectly

What is gradient descent?

Use negatve of the gradient to update the weights
w <- w - DeltaL
w <- w + xi yi

How well did you know this?

Not at all

Perfectly

How do you calculate the gradiant of the loss

Delta L = -xi yi

How well did you know this?

Not at all

Perfectly

What is logistic regression?

Used for binary classification tasks.
Uses the sigmoid funtion to model the probability of an event occuring.

p(C1|x) = sigmoid(w^T x)
p(C2|x) = 1 - p(C1|x)

How well did you know this?

Not at all

Perfectly

What is the sigmoid funcion

Used for logistic regression

sigmoid (a) = 1/(1+e^-a)

How well did you know this?

Not at all

Perfectly

What is the output of logistic regression

Study These Flashcards

Either 0 or 1, which defines a bernoulli trail

How do you find w in logistic regression

Study These Flashcards

Through maximum likelihood

What is the log likelyhood for linear regression?

Most important

Study These Flashcards

As usual in ML estimation, its easier to take the logarithm

log Fnzy L(w) = sum(m,i=1) [yi log sigmoid(ai) + (1-yi) log(1- sigma(ai))]

How does the Bernoulli trail look?

Study These Flashcards

p(C1 | x)^y (1-p(C1|x))^1-y

What is the loss function for linear regression?
(MSE) (logistic regression)

Study These Flashcards

L(f(x),y) = 1/2 sum(m,i=1) (yi - w^T xi)^2

What is the equation for linear regression?

Study These Flashcards

y = f(x) + epsilon = w^T x + epsilon
~N(0,sigmoid^2)

where
p(y|x) Ñ(f(x), sigmoid^2)

What is the gradiant of the log likelihood? (Logistic regression)

Study These Flashcards

Delta log fnzy L = sum(m,i=1)(yi - sigmoid(w^T xi)) xi

K-NN vs LLS classifier

Study These Flashcards

K-NN:
- Best when data is high dim
- More robust with small number of training examples
- Better choise when decision boundry is non-linear
- more computational expensive, since it needs comparinginput to each training example
- Generalize better with small dataset

LLS:
- best when data is low dim
- requires larger number of training examples
- Better choice when decision boundry is linear
- Closed form low comlpexity
- Generalize better with large dataset

How do we do multiclass logistic regression?

Study These Flashcards

K classes instead of 2
K weight vectors
for each class we model the density by the soft max function:

p(C | x) = exp(ak)/(sum(k,i=1)exp(ai))
where
ak = wk^T x

What is the log-likelihood for MLR

log fnzy L(w1, ..., wk) = sum(m,i=1) sum(K,k=1) yi,k log yhati,k We use one shot encoding Our estimated output Yhat follows a discrete distribution

What is the gradient with reference to w for MLR

Delta wk Fnzy L(w1, ..., wk) = sum(m,i=1) (yi,k-yhati,k)xi

What does linear least squares do?

Predict the scalar response y from an input vector x. Fit a and b in y =ax+b, to make the line fit the points.

What is classification?

From the input data, output the label/category which the input belongs to.

What are the classification problems?

Predict K categopries (2 or more)

what is regression?

Predict a continuous output from the input.

What is least squares classifier?

- change y from a continous value to a discrete value - instead of scalar y, we use one-hot encoding.

Is KNN linear?

The overall strugture is not linear, but it has n linear parts

What are options for encoding?

Integer labels One-hot encoding

2 Linear Classifiers Flashcards

(33 cards)