Decision trees Flashcards

1
Q

explain underfitting

A

describes the inability of the system to to mode the complexity of the input-output relation
the causes, the model might be too simple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

explain overfitting

A

the model is over specialized on the given dataset , the model might be too complex or too many features to optimize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what might be the cause of bad predictions

A

noisy data and lack of training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the bayesian classifier

A

minimizes the classification error probability,
the idea consists of computing the posterior probability of Wi knowing X, hence,
C(x) = argmaxj [P(wj) P(x | wj)]
We can directly compute the the posterior probability P(wj | x)
or indirectly computing the density class P(x | wj)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

provide some approach to compute densities

A

logistic regression : direct and semi-parametric method we need to estimate the Wj parameters
the indirect methods rely on estimating the unknown density function :
for parametric ones : a multivariate guassian distribution, with µk and Sk estimated from the training data
non Parametric ones, we rely on the HyperCube H(x) with volume Vcentred on x, however the issue lays on setting the size of H

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are decision trees

A

they usually rely on the successive splitting of the dataset , such that the classification rules are organized with extremities indicating the class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the types of feature we may find in decision trees

A

Quantitative feature: numbers
Qualitative feature: Blue, Brown…etc
Ordinal Feature : Small, large, medium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the measure we can use to measure the class heterogenity

A

Entropy:
Gini impurity index
misclassification index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain briefly the entroy

A

Measure teh class histogram uniformely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

explain Briefly the misclassification index

A

Classification error probability for the majority class observed in node N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

compare the Gini to misclassification error and entropy

A

Gini is simply a smooth evaluation ofthe misclassification error
entropy is an enhancement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the homogeneity gain

A

the idea consist of selecting the test that leads to minimizing the impurity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the homogeneity gain and it s bias

A

Given test T providing m possible alternatives which split node N (of size n)
into m subsets / subnodes Nj
the idea consist of selecting the test that leads to minimizing the impurity
the issue with the gain is that it favors number with large alternatives
hence to overcome that we can use the gain ratio the binary test !

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the issue with the gain ration

A

favors imbalanced partitions between the different subnodes (Nj)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly