Lecture 5 - SVG Flashcards
What are Support Vector Machines SVM?
A Support Vector Machine (SVM) is a versatile Machine Learning model: it provides linear and nonlinear decision boundaries.
- Used for (binary) classification and regression.
- Particularly well suited for classification of complex but not large datasets.
What is Large Margin Classification?
You can think of an SVM classifier as fitting the widest possible street (represented by the parallel dashed lines) between the classes.
Notice that adding more training instances “off the street” will not affect the decision boundary at all: it is fully determined (or “supported”) by the instances located on the edge of the street.
What are Support Vectors?
The instances located on theedge of the street.
Why are SVMs sensitive to feature scales?
SVMs are sensitive to the feature scales.
After feature scaling (e.g., using Scikit-Learn’s StandardScaler), the decision boundary looks much better (on the right plot).
NEED TO FIND WHY
What is Hard Margin Classification?
Strictly impose that all instances be off the street and on the correct side.
- Hard margin classification only works if the data is linearly separable.
- Furthermore, it is quite sensitive to outliers.
What is Soft Margin Classification?
The objective of soft margin classification is to find a good balance between keeping
the street as large as possible and limiting the margin violations.
- The hyperparameter C allows us to define the trade-off between the two objectives.
- A small C leads to “wider street” and more margin violations.
- A large C leads to “narrower street” and less margin violations.
REFER TO SLIDES FOR EXAMPLES
What are Nonlinear Decision Boundaries
Many datasets are not even close to being linearly separable.
One approach to handling nonlinear datasets is to add more features, such as
polynomial features.
In some cases this can result in a linearly separable dataset.
REFER TO SLIDES FOR EXAMPLE
How else can you tackle nonlinear problems?
Another option to tackle nonlinear problems is to add features computed using a
similarity function that measures how much each instance resembles a particular
landmark.
REFER TO SLIDES FOR EXAMPLE
What is Gaussian RBF Kernel?
The kernel trick again makes it possible to obtain a similar result as if you had added many similarity features. Gaussian RBF kernel with the SVC class:
SVC(kernel=”rbf”, gamma=5, C=0.001) -> Kernal trick is adding “rbf” directly to SVC function
What is the computational complextity of LinearSVC vs SVC?
- LinearSVC implements an optimized algorithm for linear SVMs. No kernel trick, but it scales almost linearly with the number of training instances and the number of features.
- SVC implements an algorithm that supports the kernel trick. Unfortunately, high training time complexity means that it gets dreadfully slow when the number of training instances gets large.
REFER TO SLIDES
What is SVM Regression?
Instead of trying to fit the largest possible street between two classes while limiting
margin violations, SVM Regression tries to fit as many instances as possible on the
street while limiting margin violations (i.e., instances off the street).
REFER TO SLIDES FOR EXAMPLE
What can you use for non linear Regression?
For nonlinear regression use a kernelised SVM model. E.g SVM Regression on a random quadratic training set, using a 2nd-degree polynomial kernel. There is little regularization on the left plot (i.e., a large C value), and much more regularization on the right plot (i.e., a small C value).
What is the Decision Function for SVMs?
REFER TO SLIDES
What can the hard margin linear SVM classifier be seen as>
The hard margin linear SVM classifier objective can be expressed as a constrained
optimzation problem -> REFER TO SLIDES FOR FORMULAS (TRAINING OBJECTIVE SLIDES
What is Quadratic Programming?
The hard margin and soft margin problems are both convex quadratic optimization
problems with linear constraints. Such problems are known as Quadratic Programming (QP) problems. REFER TO SLIDES
What is the Dual Problem?
Given a constrained optimization problem, known as the primal problem, it is possible to express a different but closely related problem, called its dual problem.
REFER TO SLIDES FOR FORMULA
What is the Kernal Trick (Kernelised SVM)
REFER TO SLIDES FOR MATH/FOMULAS
What are the common kernals used in SVM?
Linear
Polynomial of degree d
Gaussian RBF
Sigmoid
How can you use Gradient Descent with SVMs?
For linear SVM classifiers, one method is to use Gradient Descent (e.g., using
SGDClassifier) to minimize the Linear SVM classifier cost function below, which is
derived from the primal problem. Unfortunately it converges much more slowly than the methods based on QP.
REFER TO SLIDE FOR FORMULA