B07 Support Vector Machines Flashcards

Question 1

Q

What are Support Vector Machines?

Answer

A

Approach that represents data as points in multi-dimensional space, in such a way that points with separate labels are divided by a clear gap.

Question 2

Q

What is a hyperplane?

Answer

A

Given a set of data points that belong to either one of two classes, We can draw a line (a hyperplane) that separates the data into two partitions based on the label. *Linearly Separable data

Question 3

Q

What is the Maximum Margin Hyperplane? (MMH)

Answer

A

With linearly separable data, because there is a non-zero distance between the two closest points between labels, there are an infinite number of potential hyperplanes.

-The goal is to identify the hyperplane that creates the
greatest separation between classes. The MMH

Question 4

Q

What are support vectors?

Answer

A

The points from each class which are closest to the MMH are known as the support vectors.
Each class must have one or more support vectors.

Question 5

Q

What is Quadratic Optimization?

Linearly Separable Data

Answer

A

-A technique for finding the MMH.

-This approach attempts to find the perpendicular bisector of the
shortest line between the outer boundaries of the classes.

Question 6

Q

Other methods for finding the MMH?

Answer

A

-An alternative technique involves a search through the space of
every possible hyperplane in order to identify the MMH.
-Each hyperplane is defined as:
w⃗ ⋅ x ⃗ + b = 0

Question 7

Q

To maximize the distance
between the hyperplanes, we
need to minimize the value of
, this ∥ w⃗ ∥ is expressed as:

Answer

A

The goal is to find a set of weights that specify two hyperplanes:
Using vector geometry, the distance between the two hyperplanes is defined as:
w⃗ ⋅ x ⃗ + b ≥ + 1
w⃗ ⋅ x ⃗ + b ≤ − 1
2/
∥ w⃗ ∥

Question 8

Q

Non-linearly separable data

What is Soft-Margin classification?

Answer

A

-A simple approach to dealing with non-linearly separable data
is to allow a small number of points that are close to the
boundary to be misclassified.

Question 9

Q

In the context of soft margin classification, what is C?

Answer

A

-The number of possible misclassifications is governed by
a user defined parameter C, which is called the cost.

-The higher the value of C, the less likely it is that the algorithm will
misclassify a point.

With the introduction of a slack variable ( Sigma) and a cost ( C) to the model, instead of finding maximum margin, we focus on
finding minimum total cost.

Question 10

Q

What is the Kernel trick?

Answer

A

-Some real life patterns cannot be dealt with by simply using soft
classifiers.
-Patterns which need multiple and/or non-linear boundaries are
dealt with using an approach known as the kernel trick.

Question 11

Q

What is a Kernel?

Answer

A

-A kernel is a function that computes the dot product
between two vectors.

-Given two vectors Xi and Xj, the kernel function:
combines them into a single number by computing their
dot product.
-Phi represents the mapping of our vectors to a new space.

Question 12

Q

The idea behind the Kernel trick?

Answer

A

-The idea behind the kernel trick is to map the classification
problem to a space in which the problem is rendered
separable via a separation boundary that is simple in the
new space, but complex in the original one.

-The transformed space typically has higher dimensionality,
with each of the dimensions being (possibly complex)
combinations of the original problem variables.

Question 13

Q

Kernel Trick Slides illustrated

Question 14

Q

Choosing the right Kernel

Answer

A

-Choosing the most appropriate kernel highly depends on
the problem at hand.
-Fine tuning the parameters of a kernel can easily become a
tedious and cumbersome task.
-This is often done in an iterative way. However, there are
some automated kernel selection tools out there.

Question 15

Q

Some common Kernel Functions:

Question 16

Q

Advantages of Support Vector Machines?

Answer

Study These Flashcards

A

Useful for classification and regression.
Not overly influenced by noise.
Not prone to overfitting.
Very effective for high-dimensional datasets.
Uses only a subset of the data in the decision process.
The ability to apply new kernels provides substantial flexibility.

Question 17

Q

Weakness of Support Vector Machines?

Answer

Study These Flashcards

A

Finding the best model requires some trial and error.
Can be rather slow to train.
Results in a black box model that is often difficult to interpret.
Results are very sensitive to the choice of kernel.
No direct probabilistic interpretation for group membership.

Question 18

Q

Applications of SVMs?

Answer

Study These Flashcards

A

Classification of microarray gene expression to identify cancer or other genetic diseases.
Text categorization.
Detection of rare but very important events.

B07 Support Vector Machines Flashcards

(18 cards)