Discriminant-based classifiers Flashcards

Question 1

Q

What are the decision boundary (DB) and decision surface (DS) of a perceptron?

Answer

A

DB: y(x) = w.T * x +w0 = 0 (w is the weight and w0 is the bias)
DS: orthogonal to w (A and B on DS <=> w.T * (A - B) = 0

Question 2

Q

What point on a perceptron’s decision surface is closest to the origin? What is its distance to the origin?

Answer

A

x’ = α*w with α = - w0 / |w|²
|x’| = - w0 / |w|

Question 3

Q

What is the distance of a point to the decision boundary?

Answer

A

x = x⊥ + γ * w / |w| with γ = y(x) / |w|

Question 4

Q

What is the perceptron algortihm?

Answer

A

-initialize w randomly
-repeat until convergence (i.e. all datapoints are correctly classified):
{
| for each datapoint x(n), t(n):
| {
| | if y(x(n)) == t(n): do nothing
| | else: w = w + x(n)t(n)
| }
}

Question 5

Q

What are the issues of perceptron?

Answer

A

can’t converge if data is not linearly separable
no unique solution (in linearly separable cases)
convergence is slow when classes are close to each other

Question 6

Q

How to chose between the primal and the dual formulations of the SVM problem?

Answer

A

Primal: problem over the model parameters (O(M^3)
Dual: problem over the number of examples (O(N²)

Question 7

Q

What are Karush-Kuhn-Tucker (KKT) conditions?

Answer

A

a_n ≥ 0
t(n) * y(x(n)) - 1 ≥ 0
a_n * (t(n) * y(x(n)) - 1) = 0
i.e. : for every datapoint n, either a_n = 0 or t(n) * y(x(n)) = 1

Question 8

Q

How is calculated the prediction of a SVM?

Answer

A

y(x) = Σ(n in support vector set) an * t(n) * K(x(n), x) + b where K is the kernel

Question 9

Q

What is the Gaussian kernel formula?

Answer

A

K(x, x’) = exp( - |x - x’|² / 2σ² )

Question 10

Q

What are hard margin SVMs and their pros and cons?

Answer

A

It is SVMs with infinite penalty for misclassified data points.

Pros:
- the max-margin solution is unique (in contrary to perceptron)

Cons:
- can only be used on linearly separable data

Question 11

Q

What are soft margin SVMs and its pros and cons?

Answer

A

It is SVMs using slack variables.

Pros:
- can be used on non-liearly separable data
- can be adapted using the cost parameter C (hard margin SVMs are soft margin SVMs with an infinite C, i.e. no error is allowed)

Question 12

Q

What are slack variables and the associated error function?

Answer

A

It’s a set of variables (ξn). For each example n:
- ξn = 0 if n is well classified and outside the correct margin boundary
- 0 ≤ ξn ≤ 1 if n is well classified and inside the correct margin boundary
- ξn > 1 if n is misclassified

The new error function is C * Σ ξn + |w|² / 2, where C is the cost parameter, which controls how much misclassified points penalize the model.

Question 13

Q

What is the SVM objective function?

Answer

A

Σ E_SV( y(n) * t(n) ) + |w|² / 2C,
where E_SV is the Hinge loss : E_SV(x) = max(0, 1-x)

Question 14

Q

What are the differences between parametric and non-parametric methods?

Answer

A

parametric vs non-parametric:
- finite vs infinite nb of param.
- nb of param. is independant vs dependant on the dataset size

Examples:
- GDA, logistic regression, primal SVM are parametric methods
- k-NN, dual SVM are non-parametric methods

Question 15

Q

What are precision, recall and F1 score?

Answer

A

Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 = 2*P*R / (P + R)

Question 16

Q

What’s Hunt’s algorithm for decision trees?

Answer

Study These Flashcards

A

Apply recursively on each (sub)dataset Dt:
- If all Dt records belong to the same class, then ‘t’ is a leaf node labelled with this class
- If Dt is empty, then ‘t’ is a leaf node labelled with default classs
- If Dt contains records of different classes, use an attribute test to split Dt

Question 17

Q

How to determine the best split for a Decision Tree?

Answer

Study These Flashcards

A

The best split is the one that yields the most homogeneous leaf nodes (i.e. each leaf node has a clear majority class).

Question 18

Q

What is the Gini index?

Answer

Study These Flashcards

A

For a node t:
Gini(t) = 1 - Σ P(j|t) ²
For a split, it’s the weighted average of the gini index for each node, i.e. Σ ni / n * Gini(i), where ni is the number of records at node i.

Question 19

Q

What is the entropy of a split?

Answer

Study These Flashcards

A

Entropy(t) = - Σ P(j|t) * log[P(j|t)]

Question 20

Q

What is the Gain ratio?

Answer

Study These Flashcards

A

Gain ratio = Gain / Entropy(ni/n)

Question 21

Q

What is (rule) post-pruning?

Answer

Study These Flashcards

A

It’s letting a decision tree overfit and then prune it (as opposed to stopping the tree earlier).
Rule post-pruning is converting the decision tree to a set of decision rules before pruning it. After rule post-pruning, the original tree can’t be recover.

Discriminant-based classifiers Flashcards

(21 cards)