Kursusgang 7 (Support vector machines) Flashcards

Question 1

Q

What is a support vector machine?

Answer

A

A linear classifier using supervised learning. The key ideas include maximal margin, having the larget possible length between two parallel lines without either making a classification error.

Question 2

Q

What is a support vector?

Answer

A

The support vectors are the points that lie on the margin edges, the parallel lines in a support vector machine.

Question 3

Q

In the linearly seperable case, what does the support vector algorithm do?

Answer

A

It looks for the separating hyperplane with the largest margin.

Question 4

Q

What are the pros and cons of support vector machines?

Answer

A

Works very well.
Fool proof as there are only a few kernels to choose from and very few hand-crafted parameters.
Slow training because of quadratic programming
Slow testing which depends on the number of support vectors (data points on the margin)
Curse of sample size.

Question 5

Q

What methods can handle non-linear-separable data for support vector machines?

Answer

A

Nonlinear classifier and increasing the dimension
* Potentially complicated
Kernel trick
* Limited to support vector machines, due to it only being applicable to algorithms using multiplication between vectors.

Question 6

Q

What is the kernel trick?

Answer

A

Training vectors x_i are mapped into a higher (maybe infinite) dimensional space by the function ø. Then SVM finds a linear separating hyperplane in this higher dimensional space.
The kernel is
K(x_i, x_j) = ø(x_i)^T x(x_j)
Place it into the decision function, instead of x_i^T * x.

Question 7

Q

What are the basic kernels for the kernel trick?

Answer

A

Linear:
K(x_i, x_j) = x_i^T * x

Polynomial:
K(x_i, x_j) = (ς * x_i^T * x_j +c)^d, ς > 0

Radial basis function:
K(x_i, x_j) = exp(-ς ||x_i - x_j||^2), ς > 0

Sigmoid:
K(x_i, x_j) = tanh(ς * x_i^T * x_j + r)

Question 8

Q

What are the pros and cons of support vector machines?

Answer

A

Works very well.
Fool proof as there are only a few kernels to choose from and very few hand-crafted parameters.
Slow training because of quadratic programming
Slow testing which depends on the number of support vectors (data points on the margin)
Curse of sample size.

Question 9

Q

Write out the decision function of the support vector machine and explain it.

Answer

A

It is a mathematical function that takes input data and calculates a decision boundary, often represented as a hyperplane, to separate data points into distinct classes. The decision function’s objective is to maximize the margin between the classes while minimizing classification errors.

f(x) = w^t * x + b = \sum_{i=1}^N α_i * y_i * x_i^T * x + b

x: The input feature vector for which the prediction is being made.
Example: A data point from the test set.

w: The weight vector that defines the orientation of the hyperplane in the feature space.
In the dual form, w is expressed as a weighted sum of the support vectors.

b: The bias term (or offset).
It determines the position of the hyperplane relative to the origin.

α_i: The Lagrange multipliers (or dual coefficients).
These are determined during the optimization process in the dual form. Nonzero values of α_i correspond to the support vectors, the data points closest to the hyperplane.

y_i: The label of the i-th training sample.
Typically, y_i∈{−1, +1} for binary classification.

x_i^T * x: The dot product between the training sample x_i and the input vector x. In the kernelized version of SVMs, this term is replaced by a kernel function K(x_i,x).

Question 10

Q

What is a soft-margin support vector machine?

Answer

A

If data is not separable, we introduce a penalty term/slack variables (violation of margin constraints),
min_{w,b,ξ_{1_N}},
such that for all i, y_i(w^T Φ(x_i) + b) ≥ 1 - ξ_i and ξ_i ≥ 0.

Kursusgang 7 (Support vector machines) Flashcards

(10 cards)