Linear and Kernel Models Flashcards

1
Q

Write down the general form of a linear predictive model

A

f(x) = <w, x> + b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Write down the optimal weight vector of a regularised model with loss L

A

w = argmin (1/N) sum{L(f(xi) - yi) + lambda * J(w)}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the goal of linear regression?

A

Finding a linear function that best interpolates a given set of labelled training points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Write down the optimal weight vector of Least Squares Regression

A

w = argmin (1/N) sum{(f(xi) - yi)^2} = (XT X)^(-1) XT y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Ridge Regression? What does it solve?

A

w = argmin (1/N) sum{(f(xi) - yi)^2 + lambda * ||w||^2} = (XT X + lambda I)^(-1) XT y
It has a closed form solution, prevents weights from exploding, and can be used for N < p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define elastic net regression

A

Combines L1 and L2 norm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define 0-1 loss

A

Indicator function that returns 1 when the target and output are not equal and 0 otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is logistic regression?

A

A regression that returns a value between 0 and 1 representing the log odds of an event.
It uses a sigmoid function
p(y=1|x) = 1 / {1 + exp(-<w,x>-b)}
Or
log{p(y=1|x) / p(y=0|x)} = <w, x> + b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens when you apply high regularisation to regression?

A

It limits the influence of individual points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do SVMs maximise?

A

The margin, i.e. the distance of the closest points to the hyper plane.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Write down the formula for the hard and soft margin versions of an SVM

A

check notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does increasing C do to an SVM?

A

It increases how much you pay for each point that violates the margin constant. I.e. decreases regularisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the advantages of kernel methods?

A

They represent a computational shortcut and permit closed form solutions when p>N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the aim of the kernel function?

A

To embed data into a space where patterns can be discovered as linear relations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Write the formula for both the primal and dual solution

A

Check notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a Kernel function? How is it used?

A

A substitute for a dot product in Kernel Regression, where for instance

alpha = (X XT)^-1 y
is substituted by a
lpha = (K)^-1 y
Where K is some function of the inner product of X

17
Q

What is multiple kernel learning?

A

The kernel K is considered a linear combination of M basis kernels. We can then learn both the kernel alphas and the weights of each kernel as a single optimisation problem.