Decision Trees Flashcards

Question 1

Q

What is a decision tree?

Answer

A

It is a process of sorting things via a set of parameters and their order

Question 2

Q

What is the idea of Entropy of information?

Answer

A

The information of a message is inversely proportional to the probability of a message

Question 3

Q

What are the 2 equations fo the entropy of information?

Answer

A

Information:

I(x) = -log₂P(x)

H = E[I] = ∑_i - p_ilog₂p_i

Question 4

Q

What is the equation for total entropy?

Answer

A

This is the sum of all the entropies for each class

Question 5

Q

How can the entropy of information be applied to compression algorithms?

Answer

A

When we have 2 classes (binary), the probability of 1 and 0 can be related by the following equation: p₀ + p₁ = 1

Question 6

Q

How do you decide what parameters to use and in what order?

Answer

A

We want the process in which the change in entropy from a parameter is greatest

Question 7

Q

What is the equation for informatin gain?

Answer

A

G(S,F) = H(S) - ∑_{f∈values(F)} |S_f| / |S| × H(S_f)

Question 8

Q

What is the ID3 philosophy?

Answer

A

Chose the method that has the greatest entropy gain

Question 9

Q

What is the ID3 algorithm?

Answer

A

IF: all examples have the same label => return a leaf with that label

ELSE IF: there are no features left to test => return leaf with the most common label

ELSE: Chose the feature F that maximises the informaation gain of S to be the next node. Add a branch of the node for each possible value f in F.

For each branch:

> Calculate Sf by removing F from the set of features

> Recursively call the algorithm with Sf to compute the gain relative to the current set of examples

Question 10

Q

What are the characteristics of the ID3 Algorithm?

Answer

A

> Greedy with respect to G. It always goes for the path with the greatest entropy gain. This can lead to a local minima

> Deals with noisy data by assigning the label to the most common class

> Always uses features so it is prone to overfitting

Uses pruning
Uses continueous varaibles - Can deal with missing attributes

Question 11

Q

What is the CART algorithm?

Answer

A

This is the same as the ID3 but instead of using entropy we use the gini impurity

Question 12

Q

What is the equation fo gini impurity?

Answer

A

G(s) = ∑_i^C (p_i(1 - p_i))

Question 13

Q

What is the equation for the gini split?

Answer

A

G(S,F) = G(S) - ∑_{f∈values(F)} |S_f| / |S| × G(S_f)

Question 14

Q

What is the issue with ID3 and Gini?

Answer

A

Both algorithms are greedy so can find a local minima. It is important to produce different decision trees.

Question 15

Q

What is a random forrest?

Answer

A

This is creating 1000s of trees with different subsets. this is called a forrest.

Each tree uses a random fraction of the features and a random fraction of training data points

Question 16

Q

What are the sources of randomness?

Answer

Study These Flashcards

A

> Random subset of features

> Random subset of training points

Question 17

Q

What is the idea of combining trees?

Answer

Study These Flashcards

A

Each tree votes for a class. The random forest classes the majoriity of the trees. This is called an ensable which is when you combine many weak classifiers into 1 strong one.

Question 18

Q

How does ID3 chose what feature to split with?

Answer

Study These Flashcards

A

The features that produces the greatest information gain

Question 19

Q

What type of data can decision trees classify that normal MLPs cannot?

Answer

Study These Flashcards

A

Non Numerical data

Question 20

Q

Answer

Study These Flashcards

A

Decision Trees Flashcards

(20 cards)