2 Linear Classifiers Flashcards

1
Q

What is one-hot encoding?

A

In classification we use this instead of a salar y.

K-dim vector per disered putput
Vector of 0 with one 1 corresponding to the class the indx represents

Ex: three class problem
[1,0,0] [0,1,0][0,0,1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

WHat is a perceptron?

A

We apply a non-linear activation of top of the linear transfrorm:
f(x) = g(w^Tx)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the perceptron define?

A

A step function

g(a) = { 1 if a >= 0
{ -1 otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the linear least squares classifier do?

A

It seperates two or more classes by finding a hyperplane that maximizes the margin between the classes.

Simple case: defines a line that seperates the two classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the solution to LS classifier?

A

W = x^t y (t is pseudoinverse)

We “learn” the predictor by solveing for W, which are the weights

Differentiate Loss ( L ) with reference to w and solve loss by setting to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does W look in least squares classifier?

A

It becomes a matrix
A value for every possible outcome, for every time we do it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you calculate loss in least squares classifier?

A

We need to solve for a matrix of coefficients ( one model per column)

L(f(x),y) = 1/2 * sum(m,i=1) (y i - w^T xi)^2
= 1/2||y - X w||^2

Optimizing, we want to minimize loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you determine class density in least squares classifier?

A

each is approximated by its own regression model:

p(Ck|x) ~~ fk(x) = wk^T x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the perceptron criterion?

A

The loss in perceptron:
L(f(x),y) = - sum(m,i=1) w^T xi yi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is learning done in perceptrons

A

By stockastic gradient descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is stockastic

A

Select training examples one by one in random order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is gradient descent?

A

Use negatve of the gradient to update the weights
w <- w - DeltaL
w <- w + xi yi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you calculate the gradiant of the loss

A

Delta L = -xi yi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is logistic regression?

A

Used for binary classification tasks.
Uses the sigmoid funtion to model the probability of an event occuring.

p(C1|x) = sigmoid(w^T x)
p(C2|x) = 1 - p(C1|x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the sigmoid funcion

A

Used for logistic regression

sigmoid (a) = 1/(1+e^-a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the output of logistic regression

A

Either 0 or 1, which defines a bernoulli trail

17
Q

How do you find w in logistic regression

A

Through maximum likelihood

18
Q

What is the log likelyhood for linear regression?

Most important

A

As usual in ML estimation, its easier to take the logarithm

log Fnzy L(w) = sum(m,i=1) [yi log sigmoid(ai) + (1-yi) log(1- sigma(ai))]

19
Q

How does the Bernoulli trail look?

A

p(C1 | x)^y (1-p(C1|x))^1-y

20
Q

What is the loss function for linear regression?
(MSE) (logistic regression)

A

L(f(x),y) = 1/2 sum(m,i=1) (yi - w^T xi)^2

21
Q

What is the equation for linear regression?

A

y = f(x) + epsilon = w^T x + epsilon
~N(0,sigmoid^2)

where
p(y|x) Ñ(f(x), sigmoid^2)

22
Q

What is the gradiant of the log likelihood? (Logistic regression)

A

Delta log fnzy L = sum(m,i=1)(yi - sigmoid(w^T xi)) xi

23
Q

K-NN vs LLS classifier

A

K-NN:
- Best when data is high dim
- More robust with small number of training examples
- Better choise when decision boundry is non-linear
- more computational expensive, since it needs comparinginput to each training example
- Generalize better with small dataset

LLS:
- best when data is low dim
- requires larger number of training examples
- Better choice when decision boundry is linear
- Closed form low comlpexity
- Generalize better with large dataset

24
Q

How do we do multiclass logistic regression?

A
  • K classes instead of 2
  • K weight vectors
  • for each class we model the density by the soft max function:

p(C | x) = exp(ak)/(sum(k,i=1)exp(ai))
where
ak = wk^T x

25
Q

What is the log-likelihood for MLR

A

log fnzy L(w1, …, wk) = sum(m,i=1) sum(K,k=1) yi,k log yhati,k

We use one shot encoding
Our estimated output Yhat follows a discrete distribution

26
Q

What is the gradient with reference to w for MLR

A

Delta wk Fnzy L(w1, …, wk) = sum(m,i=1) (yi,k-yhati,k)xi

27
Q

What does linear least squares do?

A

Predict the scalar response y from an input vector x.
Fit a and b in y =ax+b, to make the line fit the points.

28
Q

What is classification?

A

From the input data, output the label/category which the input belongs to.

29
Q

What are the classification problems?

A

Predict K categopries (2 or more)

30
Q

what is regression?

A

Predict a continuous output from the input.

31
Q

What is least squares classifier?

A
  • change y from a continous value to a discrete value
  • instead of scalar y, we use one-hot encoding.
32
Q

Is KNN linear?

A

The overall strugture is not linear, but it has n
linear parts

33
Q

What are options for encoding?

A

Integer labels
One-hot encoding