Chapter 9 - Non-linear Boudary, Kernal, SVM Flashcards

Question 1

Q

How do we use the SVC with non-linear boundaries?

Answer

A

similar to logistic regression we can transform predictors (add a quadratic predictor)
expand the set of predictors to a larger space.
(X1,X2) goes to (X1,X2,X1) with Beta3*X1^2. the boundary is quadratic in X1

Idea: add polynomial terms up to degree d

doesn’t make the computation more expensive because all we need to know is to the dot product

Question 2

Q

Kernels

Answer

A

a kernel is positive definite matrix (symmetric with positive eigen values) which is defined by K(i,k) = for a set of zi, …, zk linearly independent vectors

Question 3

Q

The kernel trick

Answer

A

instead of defining a mapping that expands the set of predictors and then computing the dot product, use the kernel trick.

1) prove that a function is positive definite
2) and then compute K using that function

Question 4

Q

Kernel specifics (3)

Answer

A

proving that a kernel is PD is not always easy. However we can use the rule that the sums and products of PD kernels are PD. Intuitively a kernel defines the similar between two observations.

common kernels: polynomial kernel, radial basis kernel

kernels are also great at dealing with non-standard data types (strings, graphs, trees, images). It is easier to deﬁne a similarity measure which is PD than a set of features that capture the information in each sample

Question 5

Q

Support Vector Machines

Answer

A

a support vector classifier applied to an expanded set of predictors.

we expand the set of predictors for each sample x_i then perform the algorithm
we again only need to know the dot products for every pair of samples

Question 6

Q

The Kernel trick and SVM

Answer

A

often the dot product is a simple function of the original vectors (even if the mapping significantly expands the space of predictors). In some cases, you can use the kernel trick even if you are expanding into an infinite number of transformations (radial kernel)

this works because we can evaluate whether K is a measure of similarity without understanding the transformation

Question 7

Q

Kernel trick beyond SVMs

Answer

A

Kernel PCA - suppose we wanted to do PCA on an expanded set of predictors, using PCA.

if we want to make a biplot all we need to know if the kernel or Gram matrix
even if phi maps p to a very high dimensional space, we can still do PCA
the cost is only a fcn of n

Question 8

Q

SVM to more than 2 classes. (2 approaches)

Answer

A

SVMs don’t generalize nicely to cases of more than 2 classes.

1 v 1 - compare n choose 2 SVMs choosing every pair classes. assign a sample i to the class that wins the most 1 v 1 challenges

1 v all - One vs. all: Construct a classiﬁer beta(k)
for every class k against all others. Assign a sample i to the class k, such that the distance from xi to the hyperplane deﬁned by beta (k) is
largest (the distance is negative if the sample is misclassiﬁed).

Question 9

Q

relationship to logistic regression

Answer

A

we can formulate the method for finding an SVM as a Loss + Penalty optimization. for each sample we incur a loss max[0 , 1 - yif(x)]. in logistic regression we minimize a similar type of loss min[sum[log[1 + exp(-yif(X))]]

Chapter 9 - Non-linear Boudary, Kernal, SVM Flashcards

(9 cards)