Decision Trees Flashcards

1
Q

Gehören Entscheidungsbäume zum überwachten oder unüberwachten Lernen?

A

zum überwachten Lernen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Was sind die Vorteile von Entscheidungsbäumen?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Was sind die Nachteile von Entscheidungsbäumen?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Was bedeutet eine geringere Entropie?

A

weniger Unsicherheit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Wie berechnet man die Entropie eines Datensets?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Wie bestimmt man die Information (entropy) to classify a tuple in D?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Wie bestimmt man die Average information needed, after using attribute A to split D into k partitions?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Wie berechnet man den Information Gain?
(Information gained by branching on attribute age)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Wie bestimmt man die Gini Impurity?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Wie bestimmt man die Qualität eines Splits anhand der Gini Impurity?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Wie berechnet man den GainRatio?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Wozu braucht man den GainRatio?

A

Weil die Gini Impurity biased gegenüber Splits mit mehreren unterschliedlichen Klassen ist, über die, die Daten aufgeteilt werden

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Wie würde das Gain Kriterium aussehen, wenn man fehlende Datan an einem Attribut hat?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Was ist Pruning?

A

Das entfernen von Subbäumen und austauschen durch Blätter nach der Buildingphase, um Overfitting zu vermeiden

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Was ist das Problem beim Prepruning?

A

Es ist schwierig einen threshold vorher zu definieren

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Wie bestimmt man den Prediction Error, um über Pruning zu entscheiden?

A
17
Q

Nenn Limitationen von Entscheidungsbäumen

A
  • Construction / search
    process depends only
    on local information
  • Greedy: current best split does not consider future splits
18
Q

Wie kann man mit Entscheidungsbäumen regressieren?

A
19
Q

Sind Gini coefficient und Gini impurity das gleiche?

A

Nein

20
Q

Entropy is small if the distribution is heavily skewed.
Stimmt das?

A

Ja

21
Q

Gini-impurity and information gain are biased towards attributes with a small number of values.
Stimmt das?

A

Nein

22
Q

With the rule extraction algorithm, it is possible to find rules that perfectly distinguish between different classes for any dataset.
Stimmt das?

A

Nein

23
Q

Which of the following statements are correct?
1. Decision trees can only handle discrete or continuous attributes, but no mixed attributes.
2. Decision trees are especially good at dealing with connected attributes / doing diagonal classification.
3. Greedy decision tree algorithms most often generate globally optimal trees.
4. If any k out of n class conditions are met, the tree needs (𝑛 über 𝑘) regions for this class

A

4.

24
Q

Which of the following statements about information gain are correct?
1. To decide on how to split the data, the information gain has to be computed only for one attribute.
2. Information gain has to be maximized.
3. Information gain Gain(A) is the difference of the entropy without splitting and the average entropy of the partitions created by using attribute A to split the data.
4. Splitting based on information gain favors large inner-partition entropy.

A

2 und 3

25
Q

Which of the following statements about continous attributes are correct?
1. Any threshold value between two adjacent values of a continuous attribute has the same effect for splitting.
2. It’s possible to handle continuous classes by testing all possible splits of an attribute.
3. Information gain based methods can’t handle continuous attributes.
4. Gini impurity assumes all attributes are continuous.

A

1 und 4

26
Q

Which of the following statements are correct?
1. Gini impurity assumes that all attributes are categorical.
2. Gini impurity has to be maximized.
3. Gini impurity is at its maximum value if all the records are equally distributed among all classes.
4. Gini impurity has to be calculated for every attribute and every possible split of each attribute.

A

3 und 4

27
Q

Which of the following statements are correct?
1. Gini impurity is at its maximum value if all the records are equally distributed among all classes.
2. Gini impurity has to be maximized.
3. Gini impurity assumes that all attributes are categorical.
4. Gini impurity has to be calculated for every attribute and every possible split of each attribute.

A

1 und 4