L18 - Support Vector Machine Flashcards
What type of model is a SVM?
Non-probabilistic Supervised Classification.
What is the goal of SVM?
Find the optimal plane through a set of data points. That being, the one with the largest margin from the plane to the support vectors.
This enables greater classification capability.
In a feature space, what do the dimensions represent?
Features. I.e if there are 5 features, there are 5 dimensions in the feature space.
How does SVM classify data?
The feature space is partitioned, each partition represents a class. The SVM places the data about the partition planes to classify it.
For example, 2 dimensions, data would be place above or below the plane for classification.
What does it mean the SVM is non-probabilistic?
The classification is based purely on features of the data point.
What does the Kernel Trick enable SVM to do?
Make classification of non-linear data
What are Support Vectors?
Data points closest to the separation plane.
They are called this since the support the classification decision boundaries.
Define the Maximum Margin Classifier…
When we classify based on the threshold that gives us the largest margin between the place and the support vectors.
What is an issue with using Max. Margin Classifier?
Prone to outliers.
What is the solution to the outlier problem?
Enable the SVM to make misclassifications.
What is the Soft Margin of the SVM?
If misclassifications are allowed, the soft margin is the margin between the support vectors and the separation plane.
But the soft margin enables misclassifications, meaning that outliers on the wrong side of the separation plane will be classified to prevent outlier impacting classification.
Thus, misclassification is allowed to enhance the general classification capability.
How do we know if we have non-linear classification?
If data overlaps in the features space.
If we have a non-linear problem, what transformation do we perform on the data?
A mapping transformation in which overlapping features are bought into higher dimensions. This means if the data is not linearly separable in one dimension, it should be separable in a higher dimensional space.
What is the name of the function that performs higher dimensionality mapping? Given an example of one
Kernel Function
Example : X mod 2
What is the issue with non-linear SVM?
Computational cost of operations increase greatly when they have to be performed in higher dimensions.
This is infeasible for large data.
What is the solution to the high computational cost of kernel functions in high dimensional space?
Kernel Trick - Enables the data transformation to be applies without moving to higher dimensional feature space.
How does the Kernel Trick work?
Create a Kernel Matrix M = NxN where N is the number of data points in the feature space.
Kernel Function performs a dot product on each matrix cell M_i,j
What is a Kernel Function?
A function that takes 2 feature vectors from the features space, and returns the dot-product to the feature space.
What type of value does a Kernel Function return?
A real number.
What are some Advantages and Disadvantages of SVM?
Advantages:
- Effective in higher dimensions
- Capability of non-linear classifications
- Memory efficient
Disadvantages:
- If feature count is greater than the number of data points, the feature space will be greater than the sample count. This will lead to poor classification capability.