05 Classification Flashcards
Clustering
Classification
- Supervised learning, requiring labeled training data
- Train a classifier to automatically assign new instances to predefined classes, given som set of examples
Some examples of classification tasks
- Named entity recognition
- Document (topic) classification
- Authorship attribution
- Sentiment analysis
- Spam filtering
Different ways of representing classes
Exemplar-based classification
- No abstraction. Every stored instance of a group can potentially represent the class.
- Used in so-called instance based or memory based learning (MBL).
- In its simplest form; the class = the collection of points.
- Another variant is to use medoids, – representing a class by a single member that is considered central, typically the object with maximum average similarity to other objects in the group.
Centroid-based representation of classes
- The average, or the center of mass in the region.
- Given a class ci, where each object oj being a member is represented as a feature vector xj, we can compute the class centroid μ⃗i as

In our vector space model, objects are represented as ___, so a class will correspond to a collection of ____; a region.
Vector space classification is based on the the _____ hypothesis.
The contiguity hypothesis
Classification amounts to computing the _____.
Classification amounts to computing the boundaries in the space that separate the classes; the decision boundaries.
Both centroids and medoids represent a group by a ____.
Both centroids and medoids represent a group by a single prototype.
While a medoid is an actual member of the group, a centroid is an ____.
While a medoid is an actual member of the group, a centroid is an abstract prototype; an average.
Typicality can be defined by a member’s distance to ____.
Typicality can be defined by a member’s distance to the prototype.
The centroid could also be ____:
Let each member’s contribution to the average be determined by its average _____ to the other members of the group.
The centroid could also be distance weighted:
Let each member’s contribution to the average be determined by its average pairwise similarity to the other members of the group.
Hard classes
- Membership considered a Boolean property: a given object is either part of the class or it is not.
- A crisp membership function.
- A variant: disjunctive classes. Objects can be members of more than
- one class, but the memberships are still crisp.
Soft classes
- Class membership is a graded property.
- Distance weighted.
- Probabilistic variant: The degree of membership for a given object restricted to [0, 1], and the sum across classes must be 1.
- Fuzzy variant: The membership function is still restricted to [0, 1], but without the probabilistic constraint on the sum.
Rocchio classification AKA
The decision boundary of the Rocchio classifier
Problems with the Rocchio classifier
- The classification decision ignores the distribution of members locally within a class, only based on the centroid distance.
- Implicitly assumes that classes are spheres with similar radiuses.
- Does not work well for classes than cannot be accurately represented by a single prototype or center (e.g. disconnected or elongated regions).
- Because the Rocchio classifier defines a linear decision boundary, it is only suitable for problems involving linearly separable classes.
Classes that are not linearly seperable in a given feature space may become linearly separable when the features are ____.
Classes that are not linearly seperable in a given feature space may become linearly separable when the features are mapped to a higher-dimensional space.
kNN-classification
k Nearest Neighbor classification
- Assign each object to the majority class among its k closest neighbors
- Non-linear classifier
- The kNN decision boundary is determined locally
- Defined by the Voronoi tessellation
Voronoi tessellation
- Assuming k = 1: For a given set of objects in the space, let each object define a cell consisting of all points that are closer to that object than to other objects.
- Results in a set of convex polygons; so-called Voronoi cells.
- Decomposing a space into such cells gives us the so-called Voronoi tessellation.
- In the general case of k ≥ 1, the Voronoi cells are given by the regions in the space for which the set of k nearest neighbors is the same.
“Softened” kNN-classification
A probabilistic version
- The probability of membership in a class c given by the proportion of the k nearest neighbors in c
Distance weighted vote

For kNN, test time is _____ in the size of the training set, but independent of _____.
For kNN, test time is linear in the size of the training set, but independent of the number of classes.
accuracy =
accuracy = (TP + TN) / N
- The ratio of correct predictions
- Not suitable for unbalanced numbers of positive/negative examples
precision
precision = TP / (TP + FP)
- The ratio of detected class member that were correct
recall =
recall = TP / (TP + FN)
- The number of actual class members that were detected
- Trad-off: Positive predictions for all examples would give 100% recall, but (typically) terrible precision.
F-score =
F-score = 2*precision*recall / (precision + recall)
- Balanced measure of precision and recall (harmonic mean)
Evaluating multi-class predictions
- Macro-averaging
- Micro-averaging