Section 4 Support Vector Classifiers Flashcards
What is a linear classifier and give an example
A linear classifier classifies observations on the basis of a linear combination of input features and parameters ie. a hyperplane. Logistic regression is an example of a linear classifier where the hyperplane is mapped to the probability that an instance belongs to a class. It defines a linear boundary hence is a linear classifier.
What is the reason for SVM theory
The reason for SVM theory is that logistic regression always identifies a linear boundary even if data overlaps or it’s clear the data isn’t linear. Support Vector machines allow for non linear decision boundaries.
If data is perfectly separable how can we construct a classifier using separating hyperplane?
We can construct a classifier using the hyperplane which maximally separates the two classes.
How does the separating hyperplane classify?
The classifier will classify a (new) observation according to which side of the plane it lies
What is the margin for data which is separable
For any separating line, we can look at how far it is to the closest point in perpendicular distance from each class. This distance is called the margin.
Define the maximum separating hyperplane for data which is separable
The maximum separating hyperplane is the separating plane which has maximum margin for separating the classes: ensures the largest difference between classes.
What are the support vectors for data which is separable
The lines that define the margin pass through data points from each class are called the positive and negative support vectors. The maximum separating hyperplane is halfway between the two support vectors
How is the optimization problem to find the maximum separating hyperplane solved?
The optimization problem is solved using constrained optimization with Lagrange multipliers: It’s a complex process to solve the constrained optimisation problem
What is meant by a hard margin
When the support vector classifier ie the hyperplane found by optimisation is found with a constraint which ensures there are no errors - no points on the wrong side of the support vectors. Perfect classification with boundary.
Why is the dual formation key in the optimisation problem to solve for the maximum separating hyperplane.
This dual formation is key as it expresses the optimisation criterion as inner products of the observations xi.
Explain what is meant by a soft margin
When the support vector classifier ie the hyperplane found by optimisation is found with a constraint which allows the margin to be more soft: maximising margin but allow some observations on the wrong side of separating hyperplane, equivaWhatlent to constraining the problem using a cost function.
What does the Cost C stand for
The cost is related to the number of observations violating the margins. The larger the cost, the less tolerant we are for violations to the margin and the more we strive to get all the (training) data points classified correctly.
For small C, the classifier will be tolerant to a certain degree of misclassified observations.
Define the slack variable
A slack variable ξi indicates where an observation is located relative to the hyperplane and relative to the margin.
What values can the slack variables take?
ξi = 0 means observation is on correct side of margin
ξi > 0 means observation is on wrong side of margin
ξi > 1 means observation is on wrong side of hyperplane
How does the cost C control the complexity of the prediciton fucntion
The cost C expression is a penalty. C is fixed in advance, controlling the penalty paid by the classifier for misclassifying a training point and thus the complexity of the prediction function.
A high cost C will force the classifier to create a complex enough prediction function to misclassify as few training points as possible.