Week 10: Classification Flashcards
How do we carry out classification and what is its goal?
Depending on the business need, we build a model that separates values into different classes using a training dataset. The goal is for the model to be able to separate unseen values into classes as accurately as possible
How do you use the greedy approach to train a decision tree?
The algorithm should choose what appears to be the best option at that point in time, even if it isn’t in the long term. Take the biggest steps first
What are the stopping conditions for the split?
- All data for a given node belong to the same class
- There are no remaining attributes for further splitting
- Number of observations per node is small (e.g. less than 10)
What are the ways we can split attributes?
1) Multi-way
2) Binary
How do you determine the best way to split attributes?
Based on the measure of node impurity (Entropy). We want nodes to be as homogeneous as possible
What is entropy?
The measure of degree of impurity
What does it mean when the entropy value is close to 0?
It means that the node is less impure, or more homogeneous
How do you calculate entropy?
See notes
What is information gain?
The greater the information gain, the greater the decrease in entropy or uncertainty
What are the stopping conditions for stopping the split?
- Inherent features:
- All observations in the node belong to the same class
- No remaining attributes for splitting
- Parameter settings:
- Number of observations per node is small enough (e.g. min 10 observation per node)
- The depth of the tree is deep enough (e.g. If depth = 3, the tree has maximum 3 levels of splitting)
- The improvement of class impurity is less than a specified minimum
- When the stopping condition is met
- It is possible to split until 1 leaf for each observation (100% accuracy), but this causes overfitting. To avoid this, set constraints on tree size or do tree pruning
What are some pros of decision trees?
Easy to Understand: DT output is very easy to understand even
for people from non-analytical background
Useful in Data exploration: DTs easily identify most significant
variables and relation between two or more variables
Handle outliers: Not heavily influenced by outliers and missing
values
Data type is not a constraint: Can handle both numerical and
categorical variables
Non Parametric Method: Decision tree is considered to be a
non-parametric method. i.e. DTs have no assumptions about
the space distribution and the classifier structure
What are some cons of decision trees?
Overfitting: Can easily overfit (to avoid overfitting, use
constraints on model parameters and pruning)
Limitation for continuous variables: Continuous numerical
variables need to be discretized into categories, thus losing
some information