Classification: linear models III - support vector machines Flashcards

1
Q

What are a maximum margin hyperplane?

A

See figure!

It is the hyperplane that can have the largest margin from the hyperplane to the nearest points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why SVM then?

A

A smart way for maximum margin

Instead of maximizing 1/ ||w||

we can maximize the equevalent ½*||w||2 which is much easier computation wise (don’t know how though)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the SVM use for this maximization called the dual optimization problem?

A

It uses Lagrange multipliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the solution of the dual optimization problem give us?

A

The solution of the dual problem gives the solution of the original problem eg. finding the optimal line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can we handle non linear seperable data in an SVM?

A

Use data transformation to map it into another feature space

Problem: suitable feature spaces may need to be very high-dimensional. That makes computations with transformed vectors φ(x) very expensive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we handle the expensive computations of mapping to a higher feature space?

A

We use Kernels, more precisely the kernel trick, to avoid having to know the feature space, but just get a measure of similarity to use, which is much less costly!

We can replace the the dot products in the SVM with a kernel function (this method is called the kernel trick!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why kernels, as opposed to feature vectors?

A

One big reason is that in many cases, computing the kernel is easy, but computing the feature vector corresponding to the kernel is really really hard.

The feature vector for even simple kernels can blow up in size, and for kernels like the RBF kernel ( k(x,y) = exp( -||x-y||^2), (see Radial basis function kernel) the corresponding feature vector is infinite dimensional. Yet, computing the kernel is almost trivial.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we know if we can remove a point from the set of support vectors?

A

If the maximum margin hyperplane, does not change when removing this vector from the set, we do not have to keep it and it is therefore not a support vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the pros and cons of SVMs and kernels?

A

See figure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we construct kernels?

A

Using Mercers theorem

This gives us what we can call mercer kernels

See figure for definition!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the often used kernels

A

polynomial kernel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some kernels we can make for string data?

(structured data)

A

String features:

φu(s) := #occurrences of u as a substring of s

φ + u (s) := #occurrences of u as a subsequence of s

https://gyazo.com/1db9464264948385c2dc8e6e2d8640d2

p-Spectrum Kernel

Look for substring of length p

https://gyazo.com/3ba1c3f20eda4612f00789312a3d6b85

All subsquences kernel

The all subsequences kernel builds on the concept of the p-spectrum kernel, but allows gaps within the strings.

https://gyazo.com/37428de39514b5ee44c17d56df0bec1b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The all subsequence kernel, we can often not compute whole feature vector. What to do?

A

Use dynamic programming, by doing recurssion.

See figure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly