Classification: linear models III - support vector machines Flashcards

Question 1

Q

What are a maximum margin hyperplane?

Answer

A

See figure!

It is the hyperplane that can have the largest margin from the hyperplane to the nearest points.

Question 2

Q

Why SVM then?

Answer

A

A smart way for maximum margin

Instead of maximizing 1/ ||w||

we can maximize the equevalent ½*||w||² which is much easier computation wise (don’t know how though)

Question 3

Q

What does the SVM use for this maximization called the dual optimization problem?

Answer

A

It uses Lagrange multipliers

Question 4

Q

What does the solution of the dual optimization problem give us?

Answer

A

The solution of the dual problem gives the solution of the original problem eg. finding the optimal line.

Question 5

Q

How can we handle non linear seperable data in an SVM?

Answer

A

Use data transformation to map it into another feature space

Problem: suitable feature spaces may need to be very high-dimensional. That makes computations with transformed vectors φ(x) very expensive.

Question 6

Q

How do we handle the expensive computations of mapping to a higher feature space?

Answer

A

We use Kernels, more precisely the kernel trick, to avoid having to know the feature space, but just get a measure of similarity to use, which is much less costly!

We can replace the the dot products in the SVM with a kernel function (this method is called the kernel trick!)

Question 7

Q

Why kernels, as opposed to feature vectors?

Answer

A

One big reason is that in many cases, computing the kernel is easy, but computing the feature vector corresponding to the kernel is really really hard.

The feature vector for even simple kernels can blow up in size, and for kernels like the RBF kernel ( k(x,y) = exp( -||x-y||^2), (see Radial basis function kernel) the corresponding feature vector is infinite dimensional. Yet, computing the kernel is almost trivial.

Question 8

Q

How do we know if we can remove a point from the set of support vectors?

Answer

A

If the maximum margin hyperplane, does not change when removing this vector from the set, we do not have to keep it and it is therefore not a support vector.