Chapter 3: Classification Flashcards
Classification Task
The data for a classification task consists of a collection of instances (records).
Each such instance is characterized by a tuple (x, y), where x is the set of attribute values that describe the instance and y is the class label of the instance.
The attribute set x can contain attributes of any type, while the class label y must be categorical.
A classification model is an abstract representation between the attribute set and the class label.
2 important roles of a classification model in data mining
- Used as a predictive model to classify unlabeled instances.
- Serves as a descriptive model to identify the characteristics that distinguish instances from different classes.
Accuracy
Number of correct predictions
/
Total number of predictions
Error rate
Number of wrong predictions
/
Total number of predictions
3 Types of DT nodes
- A root node with no incoming links and zero outgoing links.
- Internal nodes, each of which has exactly one incoming link and two or more outgoing links.
- Leaf or terminal nodes, each of which has exactly one incoming link and no outgoing links.
Hunt’s Algorithm
A decision tree is grown in a recursive fashion.
The tree initially contains a single root node that is associated with all the training instances.
If a node is associated with instances from more than one class, it is expanded using an attribute test condition that is determined using a splitting criterion.
A child leaf node is creater for each outcome of the attribute test condition and the instances associated with the parent node are distributed to the children based on the test outcomes.
This node expansion step can then be recursively applied to each child node, as long as it has labels of more than one class.
If all the instances associated with a leaf node have identical class labels, then the node is not expanded any further.
Each leaf node is assigned a class label that occurs most frequently in the training instances associated with the node.
2 Key design issues to be addressed with Hunt’s algorithm
1) What is the splitting criterion?
At each recursive step, an attribute must be selected to partition the training instances associated with a node into smaller subsets associated with its child nodes.
The splitting criterion determines which attributed is chosen.
2) What is the stopping criterion?
The basic algorithm stops expanding a node only when all the training instances have the same class labels or identical attribute values. However, early termination can have advantages.
Impurity of a node
The impurity of a node measures how dissimilar the class labels are for the data instances belonging to a common node.
E.g.
- Entropy
- Gini index
- Classification error
Gain in purity
The degree of impurity of the parent node, compared with the weighted degree of impurity of the child nodes.
Δ=I(parent) − I(children)
Characteristics of Decision Tree Classifiers
Applicability
Decision trees are a nonparametric approach for building classification models.
Doe not require any prior assumption about the probability distribution governing the class and attributes of the data.
It is applicable to both categorical and continuous data without requiring attributes to be transformed into a common representation via binarization, normalization or standardization.
Unlike some binary classifiers, it can also deal with multiclass problems without the need to decompose them into multiple binary classification tasks.
Trees are relatively easy to interpret.
The accuracy of trees are quite comparable to other classification teqnichues for many simple datasets.
Characteristics of Decision Tree Classifiers
Expressiveness
A decision tree provides a universal representation for discrete-valued functions. I.e. it can encode any function of discrete-valued attributes. This is because every discrete-valued function can be represented as an assignment table, where every unique combination of discrete attributes is assigned a class label.
Characteristics of Decision Tree Classifiers
Computational Efficiency
Since the number of possible decision trees can be very large, many decision tree algorithms employ a heuristic-based approach to guide their search in the vast hypothesis space.
For many datasets, such techniques quickly construct a reasonably good decision tree even when the training set size is very large.
Classifying a test record is extremely fast, with a worst-case complexity of O(w)
, where w
is the maximum depth of the tree.
Characteristics of Decision Tree Classifiers
3 Approaches to Handling missing values
- Probabilistic split method
- Surrogate split method
- Separate class method
DT handling missing values
Probabilistic split method
Distributes the data instance to every child of the splitting node according to the probability that the missing attribute has a particular value.
DT handling missing values
Surrogate split method
The instance whose splitting attribute value is missing is assigned to one of the child nodes based on the value of another non-missing surrogate attribute whose splits most resemble the partitions made by the missing attribute.