Decision trees Flashcards

Question 1

Q

explain underfitting

Answer

A

describes the inability of the system to to mode the complexity of the input-output relation
the causes, the model might be too simple

Question 2

Q

explain overfitting

Answer

A

the model is over specialized on the given dataset , the model might be too complex or too many features to optimize

Question 3

Q

what might be the cause of bad predictions

Answer

A

noisy data and lack of training data

Question 4

Q

what is the bayesian classifier

Answer

A

minimizes the classification error probability,
the idea consists of computing the posterior probability of Wi knowing X, hence,
C(x) = argmaxj [P(wj) P(x | wj)]
We can directly compute the the posterior probability P(wj | x)
or indirectly computing the density class P(x | wj)

Question 5

Q

provide some approach to compute densities

Answer

A

logistic regression : direct and semi-parametric method we need to estimate the Wj parameters
the indirect methods rely on estimating the unknown density function :
for parametric ones : a multivariate guassian distribution, with µk and Sk estimated from the training data
non Parametric ones, we rely on the HyperCube H(x) with volume Vcentred on x, however the issue lays on setting the size of H

Question 6

Q

what are decision trees

Answer

A

they usually rely on the successive splitting of the dataset , such that the classification rules are organized with extremities indicating the class

Question 7

Q

what are the types of feature we may find in decision trees

Answer

A

Quantitative feature: numbers
Qualitative feature: Blue, Brown…etc
Ordinal Feature : Small, large, medium

Question 8

Q

What is the measure we can use to measure the class heterogenity

Answer

A

Entropy:
Gini impurity index
misclassification index

Question 9

Q

Explain briefly the entroy

Answer

A

Measure teh class histogram uniformely

Question 10

Q

explain Briefly the misclassification index

Answer

A

Classification error probability for the majority class observed in node N

Question 11

Q

compare the Gini to misclassification error and entropy

Answer

A

Gini is simply a smooth evaluation ofthe misclassification error
entropy is an enhancement

Question 12

Q

what is the homogeneity gain

Answer

A

the idea consist of selecting the test that leads to minimizing the impurity

Question 13

Q

what is the homogeneity gain and it s bias

Answer

A

Given test T providing m possible alternatives which split node N (of size n)
into m subsets / subnodes Nj
the idea consist of selecting the test that leads to minimizing the impurity
the issue with the gain is that it favors number with large alternatives
hence to overcome that we can use the gain ratio the binary test !

Question 14

Q

what is the issue with the gain ration

Answer

A

favors imbalanced partitions between the different subnodes (Nj)

Decision trees Flashcards

(14 cards)