Discriminant-based classifiers Flashcards
What are the decision boundary (DB) and decision surface (DS) of a perceptron?
- DB: y(x) = w.T * x +w0 = 0 (w is the weight and w0 is the bias)
- DS: orthogonal to w (A and B on DS <=> w.T * (A - B) = 0
What point on a perceptron’s decision surface is closest to the origin? What is its distance to the origin?
x’ = α*w with α = - w0 / |w|²
|x’| = - w0 / |w|
What is the distance of a point to the decision boundary?
x = x⊥ + γ * w / |w| with γ = y(x) / |w|
What is the perceptron algortihm?
-initialize w randomly
-repeat until convergence (i.e. all datapoints are correctly classified):
{
| for each datapoint x(n), t(n):
| {
| | if y(x(n)) == t(n): do nothing
| | else: w = w + x(n)t(n)
| }
}
What are the issues of perceptron?
- can’t converge if data is not linearly separable
- no unique solution (in linearly separable cases)
- convergence is slow when classes are close to each other
How to chose between the primal and the dual formulations of the SVM problem?
Primal: problem over the model parameters (O(M^3)
Dual: problem over the number of examples (O(N²)
What are Karush-Kuhn-Tucker (KKT) conditions?
- a_n ≥ 0
- t(n) * y(x(n)) - 1 ≥ 0
- a_n * (t(n) * y(x(n)) - 1) = 0
i.e. : for every datapoint n, either a_n = 0 or t(n) * y(x(n)) = 1
How is calculated the prediction of a SVM?
y(x) = Σ(n in support vector set) an * t(n) * K(x(n), x) + b where K is the kernel
What is the Gaussian kernel formula?
K(x, x’) = exp( - |x - x’|² / 2σ² )
What are hard margin SVMs and their pros and cons?
It is SVMs with infinite penalty for misclassified data points.
Pros:
- the max-margin solution is unique (in contrary to perceptron)
Cons:
- can only be used on linearly separable data
What are soft margin SVMs and its pros and cons?
It is SVMs using slack variables.
Pros:
- can be used on non-liearly separable data
- can be adapted using the cost parameter C (hard margin SVMs are soft margin SVMs with an infinite C, i.e. no error is allowed)
What are slack variables and the associated error function?
It’s a set of variables (ξn). For each example n:
- ξn = 0 if n is well classified and outside the correct margin boundary
- 0 ≤ ξn ≤ 1 if n is well classified and inside the correct margin boundary
- ξn > 1 if n is misclassified
The new error function is C * Σ ξn + |w|² / 2, where C is the cost parameter, which controls how much misclassified points penalize the model.
What is the SVM objective function?
Σ E_SV( y(n) * t(n) ) + |w|² / 2C,
where E_SV is the Hinge loss : E_SV(x) = max(0, 1-x)
What are the differences between parametric and non-parametric methods?
parametric vs non-parametric:
- finite vs infinite nb of param.
- nb of param. is independant vs dependant on the dataset size
Examples:
- GDA, logistic regression, primal SVM are parametric methods
- k-NN, dual SVM are non-parametric methods
What are precision, recall and F1 score?
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 = 2*P*R / (P + R)