Topic 5: Support Vector Machines Flashcards
What is the primary goal of a support vector machine?
To find an optimal hyperplane that separates two classes with the largest margin.
What is the role of kernel functions in SVMs?
To transform data into a higher-dimensional space where a linear separator is possible.
What are two strategies for multiclass classification using SVMs?
One-versus-all and one-versus-one.
What is a hyperplane in the context of SVM?
A decision boundary that separates different classes in the feature space.
What is the margin in SVM?
The distance between the hyperplane and the nearest data points from each class, known as support vectors.
What are support vectors in SVM?
The data points that are closest to the hyperplane and influence its position and orientation.
How does SVM handle non-linearly separable data?
By using kernel functions to transform the data into a higher-dimensional space where it becomes linearly separable.
What are common types of kernel functions used in SVM?
Linear Kernel: For linearly separable data.
Polynomial Kernel: For non-linear data with polynomial relationships.
Radial Basis Function (RBF) Kernel: For complex non-linear relationships.
Sigmoid Kernel: Similar to a neural network activation function.
What are the advantages of SVM?
Effective in high-dimensional spaces.
Robust to overfitting, especially with proper kernel selection.
Can model complex relationships using kernel functions.
What are the disadvantages of SVM?
Computationally intensive for large datasets.
Choosing the correct kernel and hyperparameters can be challenging.
Less interpretable compared to simpler models like decision trees.
What is the duality principle in SVM?
It refers to the equivalence between the primal and dual optimization problems, allowing kernels to be applied indirectly.
What is the primary goal of a support vector machine?
To find an optimal hyperplane that separates two classes with the largest margin.
What is the role of kernel functions in SVMs?
To transform data into a higher-dimensional space where a linear separator is possible.
What are two strategies for multiclass classification using SVMs?
One-versus-all and one-versus-one.
What is a hyperplane in the context of SVM?
A decision boundary that separates different classes in the feature space.
What is the margin in SVM?
The distance between the hyperplane and the nearest data points from each class, known as support vectors.
What are support vectors in SVM?
The data points that are closest to the hyperplane and influence its position and orientation.
How does SVM handle non-linearly separable data?
By using kernel functions to transform the data into a higher-dimensional space where it becomes linearly separable.
What is a kernel function in SVM?
A function that computes the similarity between data points in a transformed feature space without explicitly computing the transformation.
What are common types of kernel functions used in SVM?
Linear Kernel: For linearly separable data.
Polynomial Kernel: For non-linear data with polynomial relationships.
Radial Basis Function (RBF) Kernel: For complex non-linear relationships.
Sigmoid Kernel: Similar to a neural network activation function.
How is the decision boundary determined in SVM?
By solving an optimization problem that maximizes the margin while minimizing classification errors.
What are the advantages of SVM?
Effective in high-dimensional spaces.
Robust to overfitting, especially with proper kernel selection.
Can model complex relationships using kernel functions.
What are the disadvantages of SVM?
Computationally intensive for large datasets.
Choosing the correct kernel and hyperparameters can be challenging.
Less interpretable compared to simpler models like decision trees.
What is the role of the support vectors in the decision function?
Support vectors define the position of the hyperplane, and only they contribute to the decision function.
How can SVM handle multi-class classification?
One-vs-One (OvO): Trains a classifier for every pair of classes.
One-vs-All (OvA): Trains a classifier for each class against all other classes.
What is the kernel trick?
A method to compute the inner product of data in a higher-dimensional space without explicitly transforming the data, enabling efficient computation.
What happens when the data is not scaled in SVM?
Features with larger magnitudes can dominate, leading to poor performance. SVM requires scaled or normalized data for optimal results.
How does the RBF kernel parameter
𝛾
γ affect SVM performance?
A high
𝛾
γ: Models the data very closely, leading to potential overfitting.
A low
𝛾
γ: Produces a smoother decision boundary, possibly underfitting.
What is a sparse solution in SVM?
Only a subset of data points (support vectors) is used to define the model, making it memory efficient.
How does SVM compare to logistic regression?
SVM: Maximizes margin and can handle non-linear relationships using kernels.
Logistic Regression: Directly models probabilities and is simpler for linear problems.