Support Vector Machines Flashcards

Conceptual questions

1
Q

Explain the basic principles of Support Vector Machines (SVM) and how they are used in machine learning

A

The basic concept of Support Vector Machines involves adapting the maximal marginal classifier, aiming to find a hyperplane that maximizes the margin between classes. This classifier, however, can be sensitive to outliers, influencing the decision boundary and potentially misclassifying points. To address this sensitivity, soft-margin classifiers, also known as support vector classifiers, are introduced. These classifiers allow for some misclassification and are tuned through cross-validation to balance the trade-off between the decision boundary and misclassifications.

Additionally, when the data exhibit a non-linear relationship, the kernel trick is employed. This technique transforms the data into a higher-dimensional space, enabling the formation of a linear decision boundary in that higher-dimensional space, even when the relationship is non-linear in the original space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discuss the concept of a hyperplane in SVM. How does SVM find the optimal hyperplane for classification, and what is the significance of maximizing the margin?

A

The hyperplane is used when we have a Support Vector Machine - where we use the kernel trick into a higher dimension. This is used when the dimension we turn to is more than 3.

The significance in lies in how robust and generalized the classifier is. MAximizing it helps improve the SVMs ability to generalize. The larger the margin => the more likely it is to capture the underlying patterns in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does the choice of kernel function impact the SVM’s ability to handle non-linearly separable data? Provide examples of different types of kernel functions and their applications.

A

There are several kernels to choose between. I focus on the RBF and polynomial. The RBF is good at capturing complex and non-linear data points based on the distances. In other words, they are when the decision boundary is irregular and there is no clear boundaries between classes

The polynomial is also suitable for handling non-linear data. It needs to choose the polynomial degree, a too high degree can lead to overfitting.

One should choose the kernel based on the data. For example, linear kernel might be better for linear patterns in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Discuss the concept of regularization in SVM. How does it help prevent overfitting, and what are the implications of choosing a higher or lower regularization parameter?

A

Reguralization for SVM works in preventing overfitting just as in any case. The parameter is often denoted C. This controls the trade-off between smooth decision boundary and classifying correctly.

This prevents the model from being too complex.
A higher reguralization means stronger penalty for misclassification => more narrow maring. Weaker penalty means that we allow for some misclassification.

It is chosen through CV.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Compare and contrast SVM with other classification algorithms, such as decision trees. Highlight the scenarios where SVM might be more suitable or less suitable.

A

Both SVM and decision trees can handle non-linear relationship. The decision boundaries are are less intuitive due to the kernel trick. SVM tries to find a maximized margin, while the decision trees look for effective classification by rules. SVM has reguralization parameter that can help control overfitting.
SVMs are less suitable with large datasets - it takes too much time to compute.
SVMs less suitable when interpretability is crucial.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain the concept of multi-class classification using SVM. How can binary SVM be extended to handle multiple classes?

A

SVM needs to compare the classes in a binary way. There is OVO - one vs one. where each possible pair of classes are compared. the classification is determined via a majority vote. this can be computationally expensive.

OVR - one vs rest treats each class as positive, and all the others in a group as negative - ina binary fashion. this is less computationally extensive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly