05 Classification Flashcards
Clustering
Classification
- Supervised learning, requiring labeled training data
- Train a classifier to automatically assign new instances to predefined classes, given som set of examples
Some examples of classification tasks
- Named entity recognition
- Document (topic) classification
- Authorship attribution
- Sentiment analysis
- Spam filtering
Different ways of representing classes
Exemplar-based classification
- No abstraction. Every stored instance of a group can potentially represent the class.
- Used in so-called instance based or memory based learning (MBL).
- In its simplest form; the class = the collection of points.
- Another variant is to use medoids, – representing a class by a single member that is considered central, typically the object with maximum average similarity to other objects in the group.
Centroid-based representation of classes
- The average, or the center of mass in the region.
- Given a class ci, where each object oj being a member is represented as a feature vector xj, we can compute the class centroid μ⃗i as
![](https://s3.amazonaws.com/brainscape-prod/system/cm/162/473/458/a_image_thumb.png?1444130765)
In our vector space model, objects are represented as ___, so a class will correspond to a collection of ____; a region.
Vector space classification is based on the the _____ hypothesis.
The contiguity hypothesis
Classification amounts to computing the _____.
Classification amounts to computing the boundaries in the space that separate the classes; the decision boundaries.
Both centroids and medoids represent a group by a ____.
Both centroids and medoids represent a group by a single prototype.
While a medoid is an actual member of the group, a centroid is an ____.
While a medoid is an actual member of the group, a centroid is an abstract prototype; an average.
Typicality can be defined by a member’s distance to ____.
Typicality can be defined by a member’s distance to the prototype.
The centroid could also be ____:
Let each member’s contribution to the average be determined by its average _____ to the other members of the group.
The centroid could also be distance weighted:
Let each member’s contribution to the average be determined by its average pairwise similarity to the other members of the group.
Hard classes
- Membership considered a Boolean property: a given object is either part of the class or it is not.
- A crisp membership function.
- A variant: disjunctive classes. Objects can be members of more than
- one class, but the memberships are still crisp.