lecture 4 - classification Flashcards
How do linear models handle classification tasks?
Linear models for classification take an input vector x and map it onto one of K discrete classes by using a separable hyperplane in the input space.
How are linear models separated in a D-dimensional input space?
They are separated by (D−1)-dimensional hyperplanes.
How are linear models represented in regression?
y(x) = w^T x + w_0
What is the role of the activation function in classification with linear models?
- The activation function f(⋅) maps the output of the linear model to discrete classes, converting the continuous output into a class label.
- This makes the model nonlinear in its outputs while the underlying equation remains linear.
How can a step function be used for classification?
A step function can assign
- y(x)>0 to “class 1”
- y(x)≤0 to “class 2”
What is a discriminant function in classification tasks?
A discriminant function is a mathematical function used to separate data points into distinct classes by mapping input features to a decision boundary.
What is the simplest form of a discriminant function for a 2-class classification problem?
y(x) = w^T x + w_0
How is the decision boundary defined in classification?
The decision boundary is the set of all points x that satisfy:
- y(x) = w^T x + w_0 = 0
How are classes assigned based on the discriminant function?
- y(x)>0 to “class 1”
- y(x)<0 to “class 2”
- y(x)=0 is the decision boundary
If points x_a and x_b lie on the decision surface, then:
- y(x_a) = y(x_b) = 0
therefore
- w^T (x_a - x_b) = 0 (dot product of the two vectors is zero)
this indicates that w is orthogonal to every vector lying within the decision surface
How is the decision boundary equation interpreted in terms of projection and bias?
- The left side represents the projection of x onto w and determines how far x is from the boundary
- The right side represents the displacement or location of the decision boundary relative to the origin (w_0)
Why is using multiple 2-class classifiers for K-class classification not ideal?
It can lead to ambiguous regions where boundaries overlap, making it unclear which class a point belongs to.
What is the solution for well-defined decision boundaries in K-class classification?
Use a unified K-class classifier where each class C_k has its own discriminant linear function of the form y_k(x) = w_k^T x + w_k0
What is the decision rule for assigning a point to a class in K-class classification?
A point x belongs to class C_k if y_k(x) > y_j(x) for all j =/= k
How is the decision boundary between two classes defined in K-class classification?
- The boundary between classes C_k and C_j occurs when their scores are equal
- y_k(x) = y_j(x)
- this results in a (D-1) dimensional hyperplane
What is the general form of the hyperplane between two classes C_k and C_j?
(w_k - w_j)^T x + (w_k0 - w_j0) = 0
What is a property of the decision boundary in K-class classification?
Linearity of the discriminant functions makes the decision boundary in a K-class classifier singly connected and convex
What are the steps for assigning a class using a K-class classifier?
- Define the linear discriminants for each class.
- Assign weights and biases for each class.
- Calculate the discriminant scores for a given point.
- Assign the point to the class with the highest score.
How are weights and biases assigned for each class in K-class classification?
Each class C_k is assigned a weight vector w_k and a bias w_k0 which define its discriminant function
What does the discriminant score represent in a K-class classifier?
The discriminant score represents how strongly a data point is associated with a specific class.
What is a perceptron
- The perceptron is the first model that could learn.
- It is a linear model with a step activation function, classifying inputs into two distinct categories.
How does the step activation function in a perceptron work?
- y(x) = f(w^t ϕ(x))
- f(a)
- if a is positive or zero, it outputs +1, indicating one class.
- if a is negative, it outputs -1, indicating another class.
What criterion is used for training a perceptron?
Training is done using the perceptron criterion, which focuses on minimizing the total error function E_p
What does the total error function E_p represent in perceptron training?
- E_p is a score that tells how “wrong” the perceptron is on the points it misclassifies, based on the sum of predicted output multiplied by target output for the misclassified points.
- focuses only on the n misclassified points in M
- computes the sum of the terms (w^⊤ϕ_n t_n), which is the weight vectr * predicted output * target output