B07 Support Vector Machines Flashcards

1
Q

What are Support Vector Machines?

A

Approach that represents data as points in multi-dimensional space, in such a way that points with separate labels are divided by a clear gap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a hyperplane?

A

Given a set of data points that belong to either one of two classes, We can draw a line (a hyperplane) that separates the data into two partitions based on the label. *Linearly Separable data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Maximum Margin Hyperplane? (MMH)

A

With linearly separable data, because there is a non-zero distance between the two closest points between labels, there are an infinite number of potential hyperplanes.

-The goal is to identify the hyperplane that creates the
greatest separation between classes. The MMH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are support vectors?

A
  • The points from each class which are closest to the MMH are known as the support vectors.
  • Each class must have one or more support vectors.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Quadratic Optimization?

Linearly Separable Data

A

-A technique for finding the MMH.

-This approach attempts to find the perpendicular bisector of the
shortest line between the outer boundaries of the classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Other methods for finding the MMH?

A

-An alternative technique involves a search through the space of
every possible hyperplane in order to identify the MMH.
-Each hyperplane is defined as:
w⃗ ⋅ x ⃗ + b = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

To maximize the distance
between the hyperplanes, we
need to minimize the value of
, this ∥ w⃗ ∥ is expressed as:

A
  • The goal is to find a set of weights that specify two hyperplanes:
  • Using vector geometry, the distance between the two hyperplanes is defined as:
    w⃗ ⋅ x ⃗ + b ≥ + 1
    w⃗ ⋅ x ⃗ + b ≤ − 1
    2/
    ∥ w⃗ ∥
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Non-linearly separable data

What is Soft-Margin classification?

A

-A simple approach to dealing with non-linearly separable data
is to allow a small number of points that are close to the
boundary to be misclassified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In the context of soft margin classification, what is C?

A

-The number of possible misclassifications is governed by
a user defined parameter C, which is called the cost.

-The higher the value of C, the less likely it is that the algorithm will
misclassify a point.

  • With the introduction of a slack variable ( Sigma) and a cost ( C) to the model, instead of finding maximum margin, we focus on
    finding minimum total cost.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Kernel trick?

A

-Some real life patterns cannot be dealt with by simply using soft
classifiers.
-Patterns which need multiple and/or non-linear boundaries are
dealt with using an approach known as the kernel trick.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Kernel?

A

-A kernel is a function that computes the dot product
between two vectors.

-Given two vectors Xi and Xj, the kernel function:
combines them into a single number by computing their
dot product.
-Phi represents the mapping of our vectors to a new space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The idea behind the Kernel trick?

A

-The idea behind the kernel trick is to map the classification
problem to a space in which the problem is rendered
separable via a separation boundary that is simple in the
new space, but complex in the original one.

-The transformed space typically has higher dimensionality,
with each of the dimensions being (possibly complex)
combinations of the original problem variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Kernel Trick Slides illustrated

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Choosing the right Kernel

A

-Choosing the most appropriate kernel highly depends on
the problem at hand.
-Fine tuning the parameters of a kernel can easily become a
tedious and cumbersome task.
-This is often done in an iterative way. However, there are
some automated kernel selection tools out there.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Some common Kernel Functions:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Advantages of Support Vector Machines?

A
  • Useful for classification and regression.
  • Not overly influenced by noise.
  • Not prone to overfitting.
  • Very effective for high-dimensional datasets.
  • Uses only a subset of the data in the decision process.
  • The ability to apply new kernels provides substantial flexibility.
17
Q

Weakness of Support Vector Machines?

A
  • Finding the best model requires some trial and error.
  • Can be rather slow to train.
  • Results in a black box model that is often difficult to interpret.
  • Results are very sensitive to the choice of kernel.
  • No direct probabilistic interpretation for group membership.
18
Q

Applications of SVMs?

A
  • Classification of microarray gene expression to identify cancer or other genetic diseases.
  • Text categorization.
  • Detection of rare but very important events.