Lecture 3 Notes Flashcards

Question 1

Q

What game is a Decision Tree similar to?

Answer

A

20 questions

Question 2

Q

What type of problem does a Decision Tree algorithm solve?

Answer

A

Classification

Question 3

Q

What does each non-leaf node in a Decision Tree represent?

Answer

A

A query about some feature

Question 4

Q

What happens when a leaf node is reached in a Decision Tree?

Answer

A

It is a label

Question 5

Q

What is the average number of questions in a Decision Tree?

Answer

A

∑ (questions weighted by their probabilities)

Question 6

Q

What is a common issue with large Decision Trees?

Answer

A

They tend to overfit, smaller trees are better

Question 7

Q

What should be done if all samples in a Decision Tree have the same data?

Answer

A

Make a leaf node with the same label

Question 8

Q

What to do if no samples answer a question or features are exhausted in a Decision Tree?

Answer

A

Use default case / most common label

Question 9

Q

What is the Smart Choose function in Decision Trees?

Answer

A

Look for a function where a large proportion of samples fall in homogeneous groups

Question 10

Q

How is impurity measured in a Decision Tree?

Answer

A

n diff kinds of labels and set of samples X. dividing X by label gives X1 -> Xn.

Question 11

Q

What is the probability of sampling for a label in Decision Trees?

Answer

A

P(k) = |xk|/|X| for each label k

Question 12

Q

What is the Remainder in the context of Decision Trees?

Answer

A

Amount of uncertainty left in data set after Splitting

Question 13

Q

What can metric attributes in Decision Trees ask about?

Answer

A

If higher or lower than a number
(|L|/|X|) Impurity (L) + (|G|/|X|) Impurity G

Question 14

Q

What is the bias in Decision Trees regarding space?

Answer

A

Bias towards rectangular space

Question 15

Q

Impurity & Gini function

Answer

A

Information Entropy: − ∑ ( P(i) lg P(i) )
Gini: 1 - ∑ P(i)^2

Question 16

Q

Information Gain, formula and what is it

Answer

A

subtract entropy of whole - remainder, says how good the split was, Higher is better.