Chapter 9 - Max Margin + SVC Flashcards
Hyperplanes and Normal Vectors (4)
- consider a p-dimensional space of predictors
- a hyperplane is an affine space which separates the space into 2 regions
- the normal vector beta = (beta1, …, beta2) is a unit vector perpendicular to a hyperplane
- if the hyperplane goes through the origin, the deviation between a point (x1,..,xp) and the hyperplane is the dot product {x dot beta} + beta0 (dot product sign tells us which side its on)
Maximal margin classifier
- suppose we have a classification problem with y = {-1,1}
- if the classes can be separated, there will be an infinite number of hyperplanes
- solution: draw the largest possible empty margin around the hyperplane, do this for every possible hyperplane, choose hyperplane with largest margin
Finding maximal margin classifier
- write out the quadratic optimization, include the lagrange multipliers -
the lagrange multipliers is alpha > 0 iff the point falls in the margin and 0 everywhere other than the margin
Support vectors
the vectors that define the margins are called support vectors.
plugging in our estimate for w gives an estimation problem for alpha
Overall point of introducing lagrane multipliers
reduced the problem of finding w to finding a series of coefficients
Support vector classifier
Problem with maximal margin is that not all points can be separated by a hyperplane. so SVC is a relaxation of maximal margin, it allows a certain # of points to be on the wrong side of the margin or hyperplane (i.e., the Budget)
a lower budget is higher variance = low bias = overfitting (i.e., adding 1 point will dramatically change the hyperplane)
you choose C by CV
finding the SVC
- very similar to finding support vectors -
once again we can show that we can reduce the problem of finding w to that of finding alphas
which only depends on the training sample inputs through the inner products xi * xj for every pair i and j
once again, the key fact about SVC
to find the hyperplane, all we need to know about the matrix X is the dot product between every pair of observations