SVM Flashcards
Idea behind SVM?
Best Hyper-Plane
To maximize the gap between plane and point so that classification happens as widely as possible.
Why SVM?
- Linear & Non-Linear
- Classification, Regression & Outlier Detection (One Class SVM)
What is One Class SVM?
Anomaly Detection technique
Separates majority class with minority using kernel trick.
When SVM? (Real world examples)
Text classification, Image classification, Spam Detection, Handwriting identification, Gene expression analysis, Face detection and Anomaly detection
What is Hyperplane?
In a p-dimensional space, a hyperplane is a sub-space of p-1 dimensions.
It is the plane that separates the support vectors.
What is Support Vector?
Points touching the decision boundary (classifier).
Maximum Margin Hyperplane/Hard Margin Hyperplane
Hyperplane whose distance from points on either side is maximum.
Advantage of SVM
- Robust to Outliers.
- Can also be used for Non Linear Classification
- Can also be used for Regression
Why is the coefficient of the equation of the line of hyperplane on either side is 1?
Just to simplify the calculation because even if we take the values as K, we can divide the whole equation by K and get the eqn. with coefficient 1.
Soft Margin classifier / Support vector classifier
When the data is not perfectly separable, misclassifications are allowed and a penalty term is added allowing for a trade-off between a wider margin and few misclassifications.
SVM Error
Margin Error + Classification Error
i.e., Hinge Loss + Regularization Term (Penalty)
What is “C” in Hinge Loss?
C is a hyper parameter which strikes a balance between making the gap as big as possible and reducing mistakes for misclassification.
What is Kernel in SVM?
Kernel is a mathematical function which is used to map lower dimensional feature space into higher dimensional feature space.
Examples:- Linear, Polynomial, Radial Basis Function (RBF) and sigmoid.
What happens with increase or decrease in “C” in SVM?
With increasing in C, the margin widens and with decrease in C, it become less tolerant of violations, and so the margin narrows.
C controls the bias-variance trade-off.
What is the advantage of using a kernel rather than simply enlarging the feature space using functions of the original features?
Computational - kernel do not forms extra column for enlarged features, rather it transforms the existing features.