Basic Concept of Classification Flashcards
given a collection of records, each record by characterized by a tuple (x,y), where x is the attribute set and y is the label set.
Classification
6 CLASSIFICATION TECHNIQUES
- Decision Tree Based Methods
- Rule-based Methods
- Memory Based Reasoning
- Neural Network / Deep Learning
- Naive Bayes and Bayesian Belief Network
- Support Vector Machines
DECISION TREE: INDUCTION:
Training set is being inducted to train a model (Learning Algorithm -> Learn Model),
the model will be able to form a decision tree,
and we can apply the model to deduct from the test set.
is a type of algorithm that uses attributes to split the data recursively, till each split contains only a single class.
Hunt’s Algorithm
4 TYPES OF ATTRIBUTES
- Binary
- Nominal
- Ordinal
- Continuous
2 TEST CONDITION FOR NOMINAL ATTRIBUTE
- Multi-way Split – use as many partitions as distinct values.
- Binary Split – divides values into two subsets.
2 TEST CONDITION FOR ORDINAL ATTRIBUTE
- Multi-way Split – use as many partitions as distinct values.
- Binary Split – divides values into two subsets and preserve order property among attribute values.
is an approach to getting the best split where nodes with homogenous class distribution are preferred.
Greedy Approach
Formula for General Framework when finding the best split.
M0 is the value of the parent.
M12 is Node 1 * Node 2
M34 is Node 3 * Node 4
Gain = M0 - M12 VS M0 - M34
3 WAYS TO MESURE NODE IMPURITY
- Gini Index
- Entropy
- Classification Error
is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.
Gini Impurity / Index
measures homogeneity of a node uncertainty of a random variable or information content of a message.
Entropy
measures misclassification made by a node.
Classification Erro
3 STOPPING CRITERIA FOR TREE INDUCTION
- Stop expanding the node when all the record belongs to the same class.
- Stop expanding a node when all the records in the node have the same attribute.
- Early Termination Criteria.
4 ADVANTAGES OF DEVISION TREE BASED CLASSIFICATION
- Inexpensive to Construct
- Extremely Fast at Classifying Unknown Records
- Easy to Interpret for Small-Sized Trees
- Accuracy is Comparable to other classification techniques.