Support Vector Machine SVM Flashcards
Definition of SVM
Support Vector Machines (SVMs) are a set of supervised learning methods used for classification, regression, and outliers detection. They are particularly well suited for classification of complex but small- or medium-sized datasets.
Concept of Hyperplane
The core idea of SVM is to find the optimal hyperplane which maximally separates the data into different classes. In two dimensions, this hyperplane is a line dividing a plane in two parts where in each class lay in either side.
Support Vectors
Support vectors are the datapoints that lie closest to the decision boundary (or hyperplane). They are the data points most difficult to classify and have direct impact on the optimal location of the decision boundary.
Margin
Margin is defined as the distance between the separating hyperplane (decision boundary) and the nearest data point from either class. The goal of an SVM is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of new data being classified correctly.
Kernel Trick
When data are not linearly separable, SVM uses a technique called the kernel trick to transform the input space to a higher dimensional space where a hyperplane can be used to separate the data. Common kernels include linear, polynomial, radial basis function (RBF), and sigmoid.
Regularization Parameter (C)
The C parameter trades off correct classification of training examples against maximization of the decision function’s margin. A smaller C creates a wider street but more margin violations. A larger C creates a narrower street but fewer margin violations.
Application
SVMs have been used in a variety of applications such as face detection, handwriting recognition, image classification, Bioinformatics etc.
SVM Strengths and Weaknesses
SVMs are effective in high dimensional spaces and are versatile due to different Kernel functions that can be specified for the decision function. However, they do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation. They can also be inefficient (time and memory-wise) on larger datasets.