Support Vector Machines Flashcards

1
Q

How do you decide which resultant classification is best? Why does this occur?

A

There are multiple possible separation boundaries, each represented by local minima but there is often a minima that is better than others because it:
> Correctly classifies all points
> Is as far away from all points as possible
[Picture 9]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the classes that we use in support vector machines?

A

-1 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can we express that a point is correctly classified (using an equation)?

A

t(w^Tx + w_0) ≥ 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens when t(w^Tx + w_0) = 0 ?

A

This means that the point is on the decision boundary. This means that the classifier is undecided. This situation should be avoided

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can we express that all points are correctly classified and none are on the decision boundary? how can this equation be transformed?

A

t(w^Tx + w_0) ≥ ε
ε > 0
We can divide both sides by ε and set the inequality to any constant that we want (example 1)
t(w^Tx + w_0) ≥ 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
Why can the equations:
t(w^Tx + w_0) ≥ ε
ε > 0 
become: 
t(w^Tx + w_0) ≥ p
with p being any positive number?
A

Becayse we can just rescale the weights to whaever we want

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the margin?

A

The distance between the closest point to the separation boundary and the boundary itself
[Picture 10]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do we want from the magin?

A

We want to maximise it to get the best classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we calculate the distance of points to the hyperplane? How do we apply this to the margin?

A

x = x_p + d × w/||w||
w^Tx + w_0 = d||w||
t(w^Tx + w_0) / ||w|| ≥ |d|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Methematically, how do we maximise the margin?

A

By minimising ||w||

1/||w|| ∝ |d|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the support vector formulation?

A

Minimise: 0.5||w||^2
(makin the margin as large as possible)
Subject to the contraints:t_i(w^Tx_i + w_0) ≥ 1
(Every point is on the correct side, no point is on the hyperplane, this must be verified by every single point)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which points matter for SVM?

A

Only the closest points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a data set limitation of support vector formulation? Why is this a problem? What can be do?

A

The equation will only give you the answer if the data set is exactly linearly separable but if there is noise in the data set then this may be a problem. We can make exceptions but we want as few as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is it called when you allow some data points to be wrongly classified?

A

Slack

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the equation for including slack in SVM? What is the additional constraint?

A

t_i (w^Tx_i + w_0 ) ≥ 1 - ξ_i

ξ_i ≥ 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens to the value of ξ_i on different sides of the margin? What is it a measure of?

A

Correct side: ξ_i = 0
Incorrect side: ξ_i > 0
It is a measure of how much we are violating the constraint

17
Q

What is the equation that we want to minimise to reduce the number of misclassifications (including the weight of violations)?

A

Minimise: ||w||^2 + C × ∑ξ_i
C = weight of the violations
> C = 0 means we are ignoring all misclassified points
> C = ∞ means that we have a hard magin and do not allow any misclassified points

18
Q

What happens if the points are not linearly separable in their normal dimensions?

A

We can increase the number of dimensions so that they are linearly separable

19
Q

How do we expand the number of dimensions?

A

We find a basis and expand each point. The equations remain the same but are instead in vector form