Support Vector Machines Flashcards
What is the length of the projection of x onto w if w is a unit vector?
wTx
What is the margin?
Distance between the decision boundry (hyperplane) and the closest training point
How is the modulus of a vector w written?
|| w ||
What is || w ||?
sqrt(wTw)
If the hyperplane is defined as wx + w0 = 0, what is the distance from the origin to the hyperplane?
b = - w0 / ||w||
What is the perpendicular distance from a point x to the hyperplane wTx + wo = 0?
(1 / ||w||) |wTx + w0|
What is the value of the margin under the constraint mini | wTxi + w0 | = 1
1 / ||w||
What is maximizing 1 / ||w|| the same as?
minimizing ||w||2
What is the SVN optimization problem?
min<strong>w</strong> ||w||2
such that yi(wTx + w0) >= 1 for all i
What are the 2 good properties of the optimal weight parameters (for SVM’s)?
- They are linear function of the input and class labels
- Solution is sparse (optimal hyperplane determined by just a few examples)
What are support vectors?
The few training examples that determine the hyperplane
What is the problem if the data is not linearly seperable for SVM’s?
The optimization problem has no solution
What can we add to solve the problem if the data is not linearly seperable (for SVM’s)?
Slack variables
What is the SVM optimization problem (with slack variables)?
Minimize:

What is k (power of slack variable) usually set to (SVM)?
1
What is C in SVM optimization problem with slack variables?
Trade-off parameter, how important are the slack variables
What does this measure (SVM’s)?

How well we fit the data
Why does adding slack variables increase the number of support vectors?
Every non-zero slack variable adds a support vector (because slack puts the point on the margin)
What is a kernal function?
k(xi, xj) = phi(xi)T phi(xj)
Takes the feature vectors, transforms them into new feature space and takes the dot product
Whats special about kernal functions (compared to basis expansion)?
Can work with infinite dimensions
What is the form of a polynomial kernal function?
k(xi, xj) = (xi xj)d
What is the form a gaussian radial basis function kernal function?
k(xi, xj) = exp(-||xi - xj||2 / c)
What does || w || mean?
The magnitude of vector w
What do slack variables do (SVM)?
Move misclassified points to the margin