Classification Flashcards
Classification
The process of categorising data into predefined classes based on their attributes, characteristics, or features.
2 Approach of Building Model
Induction - Learn model
Deduction - Apply model
4 Types of Classification
Binary Classifier
Multiclass Classifier
Multilabel Classifier
Multioutput Classifier (Multioutput–multiclass classification)
Class vs Label
Class: A group or category of data points with similar characteristics or attributes.
Label: A name or identifier assigned to a particular data point that indicates the class to which it belongs.
Batch Learning
Training the model on the entire dataset at once; No update
2 Design Issues of Decision Tree Induction
Splitting criterion: Attribute test condition to divide records into smaller subsets
Stopping criterion: All records belong to the same class, Early termination
1 Approach Used in Splitting Criterion
Greedy approach - Nodes with purer class distribution are preferred
3 Step to Find the Best Split
Compute impurity measure (P) before splitting
Compute impurity measure (M) after splitting
Choose the attribute test condition that produces the highest gain.
4 Advantage of Decision Tree Based Classification
Relatively inexpensive to construct
Easy to interpret for small-sized trees
Fast at classifying unknown records
Versatile
2 Disadvantage of Decision Tree Based Classification
Each decision boundary involves only a single attribute
Instability
3 Measures of Node Impurity
Gini Index - min. = 0; max. = 0.5
Entropy - min. = 0; max. = 1
Misclassification error - min. = 0; max. = 0.5