2. Decision Tree Flashcards

Question 1

Q

What is the two different methods of decision tree?

Answer

A

Classification and regression trees

Question 2

Q

Which type of learning can you use with the methods?

Answer

A

Both methods can take supervised only, as they need the answer to determine correct splits

Question 3

Q

When do you use classification trees?

Answer

A

When target variable is categorical, splitting based on similar categories and final prediction is a class label

Question 4

Q

How do you evaluate a classification tree?

Answer

A

Classification trees are evaluated through purity and impurity measures

Question 5

Q

When do you use regression tree?

Answer

A

Regression tree is used when the target variable is continous, splitting data into segments of similar value and final prediction being numerical value

Question 6

Q

How do you evaluate regression model?

Answer

A

A regression model is evaluated through computing general loss function with either L1 MAE or L2 MSE

Question 7

Q

What output does classification produce?

Answer

A

Classification tree produce discrete output, learning a function that maps a data object to a discrete class

Question 8

Q

What does classification tree classify?

Answer

A

The relation between attributes and class

Question 9

Q

What can a classifier help predict?

Answer

A

It can help predict the class of a new data object

Question 10

Q

How can we explain the classification tree?

Answer

A

It is represented by a rooted tree, where each node represents a partition of the input space

Question 11

Q

What does branches and leafs represent in classification tree?

Answer

A

Branches = attributes
Leafs = decisions

Question 12

Q

What sentence can explain classification tree?

Answer

A

Ask a series of questions until a conclusion is reached

Question 13

Q

What is the foundation of the impurity measure I(r) or I(vk)?

Answer

A

Impurity measure is ensuring the best split for each split for each node. How impure is a set of data points is with respect to their class labels.

Question 14

Q

What is a good impurity score for classification problem?

Answer

A

Zero is all data points belong to a single class
High values of impurity indicate even mix of classes and is bad

Question 15

Q

In classification what is the three different impurity measures?

Answer

A

Gini index, entropy and misclassification error

Question 16

Q

In classification, when do use each of the three different measures?

Answer

A

Gini index - focus on higher performance (accuracy)
Entropy - information gain, theoretical measure less sensitive to small changes in probability
ClassError - focus on fraction of misclassified instances, incorrect predictions

Question 17

Q

In classification which method is used to see how good the model is?

Answer

A

Purity gain

Question 18

Q

In classification how does the purity gain work?

Answer

A

Also called impurity reduction, evaluating how well a particular feature or attribute splits the data in a decision tree

Question 19

Q

In regression, which method is used to evaluate the model?

Answer

A

Absolute error, average loss per observation

Question 20

Q

In regression, there is two different error calculations?

Answer

A

Mean absolute error (MAE)
Mean squared error (MSE)

Question 21

Q

In regression when should you use MAE and MSE?

Answer

A

MAE - penalizes all errors linearly and is more robust to outliers, used when data is more alike
MSE - penalizes larger errors more heavily, used for larger deviations in the data

Question 22

Q

What different Euclidean loss are there and where are they used?

Answer

A

L1 - MAE, less sensitive to outliers
L2 - MSE, average of the squared difference between predicted values and the actual
L ∞ norm - measure the maximum error, focus on worst case error