Chapter 6: Classification - Alternative Techniques Flashcards by Rayno Mostert

Types of Classifiers

5 Distinguishing Factors

Binary vs Multiclass
Deterministic vs Probabilistic
Linear vs Nonlinear
Global vs Local
Generative vs Discriminative

How well did you know this?

Not at all

Perfectly

Types of Classifiers

Binary vs Multiclass

Binary classifiers assign each data instance to one of two possible labels, typically denoted as +1 and -1.

If there are more than two possible labels available, then the technique is known as a multiclass classifier.

How well did you know this?

Not at all

Perfectly

Types of Classifiers

Deterministic vs Probabilistic

A deterministic classifier produces a discrete-valued label to each data instance it classifies,

whereas a probabilistic classifier assigns a continuous score between 0 and 1 to indicate how likely it is that an instance belongs to a particular class, where the probability scores for all classes sum up to 1.

How well did you know this?

Not at all

Perfectly

Types of Classifiers

Linear vs Nonlinear

A linear classifier uses a linear separating hyperplane to discriminate instances from different classes whereas a nonlinear classifier enables the construction of more complex, nonlinear decision surfaces.

How well did you know this?

Not at all

Perfectly

Types of Classifiers

Global vs Local

A global classifier fits a single model to the entire dataset.

Unless the model is highly nonlinear, this one-size-fits-all strategy may not be effective when the relationship between the attributes and the class labels varies over the input space.

In contrast, a local classifier partitions the input space into smaller regions and fits a distinct model to training instances in each region.

While local classifiers are more flexible in terms of fitting complex decision boundaries, they are also more susceptible to the model overfitting problem.

How well did you know this?

Not at all

Perfectly

Types of Classifiers

Generative vs Discriminative

Given a data instance x, the primary objective of any classifier is to predict the class label, y, of the data instance.

However, apart from predicting the class label, we may also be interested in describing the underlying mechanism that generates the instance belonging to every class label.

Classifiers that learn a generative model of every class in the process of predicting class labels are known as generative classifiers.

In contrast, distriminative classifiers directly predict the class labels without explicityly describing the distribution of every class label.

How well did you know this?

Not at all

Perfectly

Rule-based classifier

A rule-based classifier uses a collection of “if … then …” rules (a rule set) to classify data instances.

Each classification rule in the rule set can be expressed in the following way:

rᵢ: (Conditionᵢ) → yᵢ

How well did you know this?

Not at all

Perfectly

Rule-based classifiers

Classification rule

*rᵢ*: (*Conditionᵢ*) → *yᵢ*

The left-hand side of the rule is called the rule antecedent or precondition.

It contains a conjunction of attribute test conditions:

*Conditionᵢ* = (A₁ op *v*₁) ∧ (A₂ op *v*₂) ∧ (Aₖ op *v*ₖ)

Where (Aⱼ, *v*ⱼ) is an attribute-value pair and op is a comparison operator chosen from the set {=, ≠, >, <. ≥, ≤}

The right-hand side of the rule is called the rule consequent which contains the predicted class yᵢ.

A rule r covers a data instance x if the precondition of r matches the attributes of x.

How well did you know this?

Not at all

Perfectly

Rule-based classifiers

Measuring rule quality

The quality of a rule can be evaluated using measures such as coverage and accuracy.

How well did you know this?

Not at all

Perfectly

Rule-based classifiers

Rule coverage

(to measure rule quality)

Given a data set D and a classification rule r: A →y,

the coverage of the rule is the fraction of instances in D that tribber the rule r.

Coverage(r) = |A| / |D|

Where |A| is the number of instances that satisfy the rule anticedent and |D | is the total number of instance.

How well did you know this?

Not at all

Perfectly

Rule-based classifiers

Rule Accuracy

(to measure rule quality)

The accuracy of confidence factor is the fraction of instances triggered by r whose class labels are equal to y.

Accuracy(r) = |A ∩ y | / |A|

Where |A| is the number of instances that satisfy the rule anticedent and |A ∩ y | is the number of instances that satisfy both the antecedent and consequent.

How well did you know this?

Not at all

Perfectly

Definition 6.1

Mutually Exclusive Rule Set

The rules in a rule set R are mutually exclusive if no two rules in R are triggered by the same instance.

This ensures that every instance is covered by at most one rule R.

How well did you know this?

Not at all

Perfectly

Definition 6.2

Exhaustive Rule Set

A rule set R has exhaustive coverage if there is a rule for each combination of attribute values.

This ensures that every instance is covered by at least one rule R.

How well did you know this?

Not at all

Perfectly

Definition 6.3

Ordered Rule Set

The rules in an ordered rule set R are ranked in decreasing order of their priority.

An ordered rule set is also known as a decision list.

How well did you know this?

Not at all

Perfectly

RIPPER

Sequential Covering Algorithm

The algorithm starts with an empty decision list, R, and extracts rules for each class based on the ordering specified by the class prevalence.

It iteratively extracts the rules for a given class y using the Learn-One-Rule function.

Once such a rule is found, all the training instances covered by the rule are eliminated.

The new rule is added to the bottom of the decision list R.

This procedure is repeated until the stopping criterion is met.

The algorithm the proceeds to generate rules for the next class.

How well did you know this?

Not at all

Perfectly

Algorithm 6.1

Sequential Covering Algorithm

Study These Flashcards

Let E be the training instances and A be the set of attribute-value pairs, {(Aⱼ, vⱼ)}.
Let Y₀ be an ordered set of classes {y₁, y₂, …, yₖ}.
Let R = {} be the initial rule set
for each class y ∈ Y₀ - {yₖ} do:
. while stopping condition is not met do
. . r ← Learn-One-Rule (E, A, y)
. . Remove training instances from E that are covered by r.
. . Add r to the bottom of the rule list: R ← R ∨ r
. end while
end for
Insert the default rule, {} → yₖ, to the bottom of the rule list R.

RIPPER

Rule Pruning

Study These Flashcards

The rules generated by the Learn-One-Rule function can be pruned to improve their generatlization errors.

RIPPER prunes the rules based on their performance on the validation set.

The following metric is computed to determine whether purning is needed:

(p-n)/(p+n)

This metric is monotonically related to the rule’s accuracy on the validation set. If the metric improves after pruning, then the conjunct is removed.

p : # positive examples & n : # negative examples

Eager vs lazy learners

Study These Flashcards

Decision tree and rule-based classifiers are examples. of eager learners because they are designed to learn a model that maps the input attributes to the class labels as soon as the training data becomes available.

An opposite strategy would be to delay the process of modelling the training data until it is needed to classify the test instances.

Techniques that employ this strategy are known as lazy learners.

Nearest-neighbour classifier

Study These Flashcards

A nearest neighbour classifier represents each example as a datapoint in a d-dimensional space, where d is the number of attributes.

Given a test instance, we compute its proximity to the training instances.

The k-nearest neighbours of a given test instance z refer to the k training examples that are closest to z.

The instance is classified based on the class labels of its neighbours.

In the case where the neighbours have more than one label, the test instance is assigned to the majority class of its nearest neighbours.