Kernelization Flashcards

Question 1

Q

Explain feature maps, and why they are not always a good solution.

Answer

A

By applying a Feature map we add a non-linear transformation of the features in a linear model. For example, we can add the power of certain features as extra features. e.g. add x² ₁ as one term. this provides a way of fitting different curve shapes to the problem.

Adding new features comes at a memory and computational cost. We need to learn more weights, and the risk of overfitting becomes larger. For example, by adding more features to Ridge Regression, calculating the closed-form solution becomes quadratically harder with the amount of features.

Question 2

Q

Explain kernel functions in relation to feature maps.

Answer

A

A feature map is a function that takes feature vectors in one space and transforms them into feature vectors in another. A kernel function is a function that computes the inner product of two feature vectors in a transformed space. Kernel functions can be used to implicitly map data into higher-dimensional spaces without explicitly computing the feature maps.

The kernel function calculates the distances between points cheaply without explicitly constructing the high-dimensional space at all.

The dot product is a measure of similarity between to vectors, hence, a kernel can be seen as a similarity measure for high-dimensional spaces.

A loss function can be kernelized if it contains a dot product. The dot product (e.g. x_i * x_j) can simply be replaced by k(x_i, x_j).

Question 3

Q

Explain the Polynomial kernel. Provide the most important hyperparameter.

Answer

A

The polynomial kernel reproduces the polynomial feature maps where:
- Gamma is a scaling parameter (default 1/p)
- c₀ is a hyperparameter (default to 1) to trade off influence of the higher order terms.

k_poly(x₁, x₂) = (γ(x₁ ⋅ x₂) + c₀)^d

Question 4

Q

Explain the Redial Basis Function Kernel. Provide the most important hyperparameters. How can these hyperparameters lead to overfitting and underfitting?

Answer

A

The RBF feature map builds the Taylor series expansion of e^x.

The RBF Kernel looks like:

k_RBF (x₁, x₂) = exp(-γ || x₁ - x₂ ||²)

The RBF kernel does not use a dot product, it only considers the distance between x₁ and x₂.
It’s a local kernel: every data point only influences data points nearby, whereas linear and polynomial kernels are global : every point affects the whole space.
Similarity depends on closeness of points and kernel width.

The most important hyper parameters are γ (gamma) and C (cost of margins violations).
- γ: kernel width. High values cause narrow Gaussians, which produces more support vectors. This leads to overfitting. Low values cause wide Gaussians, which leads to underfitting.
- C: cost of margin violations. High values punish margin violations, causing narrow margins leading to overfitting. Low values cause wider margins, resulting in more support vectors leading to underfitting.

Question 5

Q

In practice what are some general guidelines to get good results with kernels in SVM’s?

Answer

A

C and gamma always need to be tuned. Find a good C, then finetune gamma.
SVMs expect all features to be approximately on the same scale. Therefore data needs to be scaled beforehand.

Question 6

Q

When would an SVM be a suitable choice as a model?

Answer

A

SVM’s work well on both low- and high dimensional data, but they are especially good at small, high dimensional data.

Kernelization Flashcards

(6 cards)