Week 10: Classification Flashcards

Question 1

Q

How do we carry out classification and what is its goal?

Answer

A

Depending on the business need, we build a model that separates values into different classes using a training dataset. The goal is for the model to be able to separate unseen values into classes as accurately as possible

Question 2

Q

How do you use the greedy approach to train a decision tree?

Answer

A

The algorithm should choose what appears to be the best option at that point in time, even if it isn’t in the long term. Take the biggest steps first

Question 3

Q

What are the stopping conditions for the split?

Answer

A

All data for a given node belong to the same class
There are no remaining attributes for further splitting
Number of observations per node is small (e.g. less than 10)

Question 4

Q

What are the ways we can split attributes?

Answer

A

1) Multi-way
2) Binary

Question 5

Q

How do you determine the best way to split attributes?

Answer

A

Based on the measure of node impurity (Entropy). We want nodes to be as homogeneous as possible

Question 6

Q

What is entropy?

Answer

A

The measure of degree of impurity

Question 7

Q

What does it mean when the entropy value is close to 0?

Answer

A

It means that the node is less impure, or more homogeneous

Question 8

Q

How do you calculate entropy?

Answer

A

See notes

Question 9

Q

What is information gain?

Answer

A

The greater the information gain, the greater the decrease in entropy or uncertainty

Question 10

Q

What are the stopping conditions for stopping the split?

Answer

A

Inherent features:
All observations in the node belong to the same class
No remaining attributes for splitting
Parameter settings:
Number of observations per node is small enough (e.g. min 10 observation per node)
The depth of the tree is deep enough (e.g. If depth = 3, the tree has maximum 3 levels of splitting)
The improvement of class impurity is less than a specified minimum
When the stopping condition is met
It is possible to split until 1 leaf for each observation (100% accuracy), but this causes overfitting. To avoid this, set constraints on tree size or do tree pruning

Question 11

Q

What are some pros of decision trees?

Answer

A

Easy to Understand: DT output is very easy to understand even
for people from non-analytical background

Useful in Data exploration: DTs easily identify most significant
variables and relation between two or more variables

Handle outliers: Not heavily influenced by outliers and missing
values

Data type is not a constraint: Can handle both numerical and
categorical variables

Non Parametric Method: Decision tree is considered to be a
non-parametric method. i.e. DTs have no assumptions about
the space distribution and the classifier structure

Question 12

Q

What are some cons of decision trees?

Answer

A

Overfitting: Can easily overfit (to avoid overfitting, use
constraints on model parameters and pruning)

Limitation for continuous variables: Continuous numerical
variables need to be discretized into categories, thus losing
some information