SVM Flashcards
SVM
A discriminative classifier formally defined by seperating hyperplanes. Given labelled data (supervised learning) the algorithm outputs an optimal hyperplane which categorizes a new example.
Margin
SVM
The distance between the line and the closest data point.
Maximum-Margin Hyperplane
The best or optimal line that can seperate two classes is the line with the largest margin.
https://medium.com/@skilltohire/support-vector-machines-4d28a427ebd\
wTx + b = -1
wTx + b = 0
wTx + b = 1
Soft Margin Classifier
- Real data is messy and cannot be seperated perfectly with a hyperplane
- Relaxing the constraint of maximizing margin allows some points to violate this.
- A tuning parameter C is introduced that defines the magnitude of wiggle across all dimensions (the amount of violation of the margin allowed).
- C = 0 means no violation -> Maximal Margin Classifier
What are the different kernel types?
5
1) Linear Kernel - inner product + constant -> K(x,xi) = (xxi) + c
2) Polynomial Kernel -> K(x,xi) = 1 + sum(xxi)^d
3) Radial Basis Function or Gaussian Kernel -> K(x,xi) = exp(-gamma * sum((x-xi^2)). Note that gamma is often between 0 and 1
4) Sigmoid Kernel -> K(x,xi) = tanh(yxi*xj +c)
5) Chi-Squared Kernel -> x^2_c = SUM(observed values - expected values)^2/Expected values
How to we evaluate an SVM model?
Confusion Matrix
|actual Values | Predicted Values|
| | + | - |
| + | TP | FP |
| - | FP | TN |
What are the formulas for precision, F1, recall and accuracy?
Precision = TP/(TP+FP)
Recall = TP/(TP + FN)
Accuracy = (TP + TN)/Total
F1 = 2(PrecisionRecall)/(Precision + Recall)
Calculate Precision, Recall, Accuracy and F1 when the following is true:
TP = 50 ; FP = 10 ; FN = 5 ; TN = 100
Final Results
Precision ≈ 0.8333
Recall ≈ 0.9091
Accuracy ≈ 0.9091
F1 Score ≈ 0.8694
Solve the following multi-dimensional model example. Find the True positive and negative and false positive and negative values.
Also solve for precision, recall and F1.
True Class |
|Predicted Class | Apple | Orange | Mango |
| Apple | 7 | 8 | 9 |
| Orange | 1 | 2 | 3 |
| Mango | 3 | 2 | 1 |
TP = 7
TN = 2 + 3 + 2 + 1 = 8
FP = 8 + 9 = 17
FN = 1 + 3 = 4
Precision = 7/(7+17) = 0.29
Recall = 7/(7+4) = 0.64
F1-score = 0.4
For class: Apple
TP = 7 (predicted Apple, true Apple)
FP = 8 + 9 = 17 (predicted Apple, but true Orange or Mango)
FN = 1 + 3 = 4 (true Apple, but predicted Orange or Mango)
TN = sum of all other cells = Total - TP - FP - FN
= 36 - 4 - 17 - 7 = 8
For class: Orange
TP = 2
FP = 8 + 2 = 10
FN = 1 + 3 = 4
TN = 36 - 2 - 10 - 4 = 20
For class: Mango
TP = 1
FP = 9 + 3 = 12
FN = 3 + 2 = 5
TN = 36 - 1 - 12 - 5 = 18
What are the advantages and disadvantages of SVM?
Advantages
1) High accuracy
2) data is linearly seperable
3) avoids overfitting
Disadvantages
1) sensitive to noise
2) only considers two classes
3) computationally inefficient