Chapter 6: Classification - Alternative Techniques Flashcards
Types of Classifiers
5 Distinguishing Factors
- Binary vs Multiclass
- Deterministic vs Probabilistic
- Linear vs Nonlinear
- Global vs Local
- Generative vs Discriminative
Types of Classifiers
Binary vs Multiclass
Binary classifiers assign each data instance to one of two possible labels, typically denoted as +1 and -1.
If there are more than two possible labels available, then the technique is known as a multiclass classifier.
Types of Classifiers
Deterministic vs Probabilistic
A deterministic classifier produces a discrete-valued label to each data instance it classifies,
whereas a probabilistic classifier assigns a continuous score between 0 and 1 to indicate how likely it is that an instance belongs to a particular class, where the probability scores for all classes sum up to 1.
Types of Classifiers
Linear vs Nonlinear
A linear classifier uses a linear separating hyperplane to discriminate instances from different classes whereas a nonlinear classifier enables the construction of more complex, nonlinear decision surfaces.
Types of Classifiers
Global vs Local
A global classifier fits a single model to the entire dataset.
Unless the model is highly nonlinear, this one-size-fits-all strategy may not be effective when the relationship between the attributes and the class labels varies over the input space.
In contrast, a local classifier partitions the input space into smaller regions and fits a distinct model to training instances in each region.
While local classifiers are more flexible in terms of fitting complex decision boundaries, they are also more susceptible to the model overfitting problem.
Types of Classifiers
Generative vs Discriminative
Given a data instance x, the primary objective of any classifier is to predict the class label, y, of the data instance.
However, apart from predicting the class label, we may also be interested in describing the underlying mechanism that generates the instance belonging to every class label.
Classifiers that learn a generative model of every class in the process of predicting class labels are known as generative classifiers.
In contrast, distriminative classifiers directly predict the class labels without explicityly describing the distribution of every class label.
Rule-based classifier
A rule-based classifier uses a collection of “if … then …” rules (a rule set) to classify data instances.
Each classification rule in the rule set can be expressed in the following way:
rᵢ: (Conditionᵢ) → yᵢ
Rule-based classifiers
Classification rule
*rᵢ*: (*Conditionᵢ*) → *yᵢ*
The left-hand side of the rule is called the rule antecedent or precondition.
It contains a conjunction of attribute test conditions:
*Conditionᵢ* = (A₁ op *v*₁) ∧ (A₂ op *v*₂) ∧ (Aₖ op *v*ₖ)
Where (Aⱼ, *v*ⱼ)
is an attribute-value pair and op is a comparison operator chosen from the set {=, ≠, >, <. ≥, ≤}
The right-hand side of the rule is called the rule consequent which contains the predicted class yᵢ.
A rule r covers a data instance x if the precondition of r matches the attributes of x.
Rule-based classifiers
Measuring rule quality
The quality of a rule can be evaluated using measures such as coverage and accuracy.
Rule-based classifiers
Rule coverage
(to measure rule quality)
Given a data set D and a classification rule r: A →y,
the coverage of the rule is the fraction of instances in D that tribber the rule r.
Coverage(r) = |A| / |D|
Where |A|
is the number of instances that satisfy the rule anticedent and |D |
is the total number of instance.
Rule-based classifiers
Rule Accuracy
(to measure rule quality)
The accuracy of confidence factor is the fraction of instances triggered by r whose class labels are equal to y.
Accuracy(r) = |A ∩ y | / |A|
Where |A|
is the number of instances that satisfy the rule anticedent and |A ∩ y |
is the number of instances that satisfy both the antecedent and consequent.
Definition 6.1
Mutually Exclusive Rule Set
The rules in a rule set R are mutually exclusive if no two rules in R are triggered by the same instance.
This ensures that every instance is covered by at most one rule R.
Definition 6.2
Exhaustive Rule Set
A rule set R has exhaustive coverage if there is a rule for each combination of attribute values.
This ensures that every instance is covered by at least one rule R.
Definition 6.3
Ordered Rule Set
The rules in an ordered rule set R are ranked in decreasing order of their priority.
An ordered rule set is also known as a decision list.
RIPPER
Sequential Covering Algorithm
The algorithm starts with an empty decision list, R, and extracts rules for each class based on the ordering specified by the class prevalence.
It iteratively extracts the rules for a given class y using the Learn-One-Rule function.
Once such a rule is found, all the training instances covered by the rule are eliminated.
The new rule is added to the bottom of the decision list R.
This procedure is repeated until the stopping criterion is met.
The algorithm the proceeds to generate rules for the next class.