Classification Flashcards
What are the two stages of classification?
Inference stage: use training data to learn a model for P(Ck|X).
Decision stage: use posterior probabilities to make optimal class assignement.
What are three approaches to solving classification problems?
Generative, Discriminative and Discriminant-based models.
What are generative classification models?
These are approaches to solving classification problems consisting in modeling a distribution (P(X|Ck) or P(X, Ck)) and deduce P(Ck|X) from it. The class can then be determined using decision theory.
N.B.: It is called “generative” because we can use the learnt distribution (P(X|Ck) or P(X, Ck)) to generate synthetic data in the input space.
What are discriminative classification models?
These are approaches to solving classification problems consisting in directly model P(Ck|X) and then use decision theory tp determine the class.
What are discriminant-based classification models?
These are approaches to solving classification problems consisting in finding a function f(X) (called “discriminant function”) which maps X to class labels.
What are lineary separable classification problems?
These are classification problems that can be solved by a linear model, e.g. whose classes can be separated by hyperplans.
What is the 1-of-k coding scheme (a.k.a. “one-hot encoding”)?
It is the encoding of the target variable of a classification problem into a vector of length [number of classes] which components are all 0 except the one corresponding to the class the datapoint belong to.
What are discriminant functions and linear discriminants?
Discriminant functions are functions mapping an input vector X to a class Ck.
Linear discriminants are discriminant functions which decision surfaces are hyperplanes.
What is the geometrical interpretation of y(x) = w.T * x + w0 as a linear discriminant function?
-The decision surface is orthogonal to w.
-w0 can be considered as a threshold and controls the distance of the decision surface to the origin (which is equal to - w0/|w|).
-The signed orthogonal distance to the decision surface of a datapoint X is y(x)/|w|.
What is a One-vs-the-rest classifier?
It’s a multi-class classifier consisting in a set of k-1 classifiers, each of which is solving a two class classification problem by separating points that are in Ck from points that are not.
What is a One-vs-one classifier?
It’s a multi-class classifier consisting in a set of k(k-1)/2 discriminant functions, each of which discriminates one of every possible pair of classes. Majority voting is then used for classification.
What is a k-class discriminant?
It’s a discriminant comprising k linear functions of the form yi(x) = wi.T * x + wi0.
x belongs to the class k such that yk(x) is the maximum among all yi(x) values.