Week 10: Classification Flashcards

1
Q

How do we carry out classification and what is its goal?

A

Depending on the business need, we build a model that separates values into different classes using a training dataset. The goal is for the model to be able to separate unseen values into classes as accurately as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you use the greedy approach to train a decision tree?

A

The algorithm should choose what appears to be the best option at that point in time, even if it isn’t in the long term. Take the biggest steps first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the stopping conditions for the split?

A
  • All data for a given node belong to the same class
  • There are no remaining attributes for further splitting
  • Number of observations per node is small (e.g. less than 10)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the ways we can split attributes?

A

1) Multi-way
2) Binary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you determine the best way to split attributes?

A

Based on the measure of node impurity (Entropy). We want nodes to be as homogeneous as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is entropy?

A

The measure of degree of impurity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does it mean when the entropy value is close to 0?

A

It means that the node is less impure, or more homogeneous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you calculate entropy?

A

See notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is information gain?

A

The greater the information gain, the greater the decrease in entropy or uncertainty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the stopping conditions for stopping the split?

A
  • Inherent features:
  • All observations in the node belong to the same class
  • No remaining attributes for splitting
  • Parameter settings:
  • Number of observations per node is small enough (e.g. min 10 observation per node)
  • The depth of the tree is deep enough (e.g. If depth = 3, the tree has maximum 3 levels of splitting)
  • The improvement of class impurity is less than a specified minimum
  • When the stopping condition is met
  • It is possible to split until 1 leaf for each observation (100% accuracy), but this causes overfitting. To avoid this, set constraints on tree size or do tree pruning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some pros of decision trees?

A

Easy to Understand: DT output is very easy to understand even
for people from non-analytical background

Useful in Data exploration: DTs easily identify most significant
variables and relation between two or more variables

Handle outliers: Not heavily influenced by outliers and missing
values

Data type is not a constraint: Can handle both numerical and
categorical variables

Non Parametric Method: Decision tree is considered to be a
non-parametric method. i.e. DTs have no assumptions about
the space distribution and the classifier structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some cons of decision trees?

A

Overfitting: Can easily overfit (to avoid overfitting, use
constraints on model parameters and pruning)

Limitation for continuous variables: Continuous numerical
variables need to be discretized into categories, thus losing
some information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly