Decision Trees Flashcards

1
Q

What is a Decision Tree?

A

Decision Tree is a tree shaped diagram used to
determine a course of action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does each branch of the Decision Tree typically represent?

A

Each branch of the tree represents a possible decision,
occurrence, or reaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What type of learning algorithm is a Decision Tree, and for what tasks is it commonly used?

A

A decision tree is a non-parametric
supervised learning algorithm, which is utilized for
both classification and regression tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is the training dataset used in building models with the Decision Tree Algorithm?

A

The training dataset is induced and then used the learning model phase by tree induction algorithm. The learned model is an outcome of the tree induction algorithm processing the training set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In terms of tasks, what are the two main applications of a decision tree?

A

classification and regression tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is a decision tree considered a popular data mining technique?

A

A decision tree visualization helps outline the decisions in a way that is easy to understand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the primary goal of creating a model using the Decision Tree Algorithm?

A

The goal is to create a model that predicts the value
of a target variable based on several input variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is an internal node represented in a decision tree, and what does it signify?

A

an internal node represents a feature (or attribute)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does each branch in a decision tree represent?

A

represents a decision rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What role does a leaf node play in a decision tree, and what does it represent?

A

Each leaf node represents the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the significance of the root node in a decision tree?

A

It learns to partition on the basis of the
attribute value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does the root node contribute to the partitioning of a decision tree?

A

It partitions the tree in a recursive manner called recursive partitioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the process known as when the decision tree partitions in a recursive manner?

A

recursive partitioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is a decision tree often compared to a flowchart diagram?

A

It’s visualization like a flowchart diagram which easily mimics the human level thinking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the three main components of a decision tree, and what do they represent?

A
  • Node
    test for the value of a certain attribute
  • Edges
    correspond to the outcome of a test
  • Leaves
    terminal nodes that predict the outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In a classification tree, what is the purpose of determining a set of logical if-then conditions?

A

To classify problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When is a regression tree used, and how does it differ from a classification tree in terms of the target variable?

A

A regression tree is used when the target variable is numerical or continuous. We fit a regression model
to the target variable using each of the independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In the context of Decision trees:

Define Gain

A

Gain is a measure of decrease in entropy after splitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Decision Trees

How to split the data?

A

We hace to frame the conditions that split the data in such a way that the information gain is highest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does the decision tree algortithm work?

A
  1. Select the best attribute using Attribute selection measures to split Records
  2. Make that attribute a decision node
  3. Start Tree building by recursively reapting this process for each child
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Generally, when does the process of Tree building stop?

A
  • When there are no more attributes
  • There are no more instances
  • All the tuples belong to the same attribute
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

List three advantages of Decision Trees in machine learning.

A

Simple to understand, interpret and visualize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What makes Decision Trees simple to understand and interpret for humans?

A

They look like simple if-else statements, and therefore can
be easily interpreted by humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the advantages of Decision Trees regarding data preparation?

A
  • No Scaling needed
  • Can work without extensive handling of missing data
  • no need for dummy variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

In terms of handling variables, what types of variables can Decision Trees manage?

A

Can handle both categorical and numerical
variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Do nonlinear parameters affect Decision Trees?

A

Nonlinear parameters don’t affect its performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a notable characteristic of Decision Trees regarding assumptions compared to statistical models?

A

Do not require the assumptions of statistical models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Identify the major disadvantage of Decision Trees in machine learning.

A

overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is overfitting, and how does it impact the performance of a decision tree?

A

Overfitting can lead to wrong decisions. A decision tree will keep generating new nodes to fit the data. This makes it loses its generalization capabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Why does a decision tree lose its generalization capabilities due to overfitting?

A

A decision tree will keep generating new nodes to fit the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What happens to the overall tree when new data points are added?

A

Leads to the regeneration of the overall tree meaning that nodes need to be recalculated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How is noise a factor that affects the stability of a decision tree model?

A

a little bit of noise can make a decision tree model unstable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Why are Decision Trees considered unsuitable for large datasets, and what issue does it lead to?

A

A large dataset can cause the tree to grow too large and
complex, which will lead to overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a low-biased Tree?

A

It is a highly complicated tree that has low bias which makes it hard for the model to work on new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What can a high variance do to a decision tree?

A

The model can get unstable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is the challenge associated with large decision trees

A

They become difficult to interpret

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How are classification rules extracted from a decision tree?

A

One rule is created for each path from the root to a leaf node.
Each splitting criterion along a given path is logically
joined by AND operator to form the “IF” part. The leaf
node holds the class prediction, forming the rule
“THEN” part.

If age = youth AND student = no then buys_computer = no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is the process of forming the “IF” part of a classification rule from a decision tree path?

A

Each splitting criterion along a given path is logically
joined by AND operator to form the “IF” part.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What information does the leaf node hold in the context of forming a classification rule?

A

The leaf node holds the class prediction, forming the rule “THEN” part.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What does the root node represent in a decision tree, and what kind of edges are associated with it?

A

can be considered as the starting point of the tree
where there are no incoming edges but zero or more
outgoing edges. The outgoing edges lead to either an
internal node or a leaf node.

The root node is usually an attribute of the decision tree model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How is an internal node defined in a decision tree, and what is its relationship with outgoing edges?

A

Appears after a root node or an internal node and is
followed by either internal nodes or leaf nodes. It has
only one incoming edge and at least two outgoing
edges.

Internal nodes are always attributes of the decision tree model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What characterizes leaf nodes in a decision tree, and what information do they typically represent?

A

These are the bottommost elements of the tree and
normally represent classes of the decision tree model.

Depending on the situation, if it can be classified, each leaf node can have only one class label or sometimes a class distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How is the class distribution handled in leaf nodes, and what determines the number of outgoing edges from a leaf node?

A

Depending on the situation, if it can be classified, each
leaf node can have only one class label or sometimes a
class distribution.
Leaf nodes have one incoming edge and no outgoing
edges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Name the Following:

A decsion Tree is created in two phases

A
  1. Recursive partitioning
  2. Pruning the tree
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is the idea of Recursive partitioning?

A

Repeatedly split the records into two or more branches, so as to achieve maximum homogeneity/purity within the new parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the idea of pruning the tree?

A

Simplify the tree by pruning peripheral branches to avoid overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

How does the concept of purity relate to the subsets created by a good attribute split?

A

a good attribute splits the examples into subsets
that are (ideally) “all positive” or “all negative”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

When dealing with numerical variables, how is the splitting process performed in decision trees?

A
  1. Order records according to the numerical variable
  2. Find midpoints between successive non- duplicate values
  3. Divide records into those with x> midpoint and those < midpoint

E.g.
for the three points 14, 14.8, 16, the midpoint between 14.0
and 14.8 is 14.4, and the midpoint between 14.8 and 16 is 15.4.
records with lot_size> 14.4 and those lot_size< 14.4)
After evaluating that split, try the next split which is 15.4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Explain the process of finding midpoints between successive non-duplicate values.

A

taking the average of the two values. For example, for the three points 14, 14.8, 16, the midpoint between 14.0 and 14.8 is 14.4, and the midpoint between 14.8 and 16 is 15.4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

How are records divided based on the midpoints and the numerical variable in decision trees?

A

Divide the records into two groups based on whether they are greater than or less than the midpoint. For example, records with lot_size > 14.4 and those with lot_size < 14.4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

How do decision trees search for the best division of the input space?

A

Decision Trees greedily search for the best division of the
Input Space into exhaustive, mutually exclusive pure rectangles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

When dealing with categorical variables, how are all possible ways of splitting the categories examined?

A

there are 𝟐^(𝒏−𝟏) -1 possible binary splits.

E.g., categories A, B, C can be split 3 ways
{A} and {B, C}
{B} and {A, C}
{C} and {A, B}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

When does the number of possible splits become significant?

A

With many categories, number of splits becomes huge

54
Q

What is the formula for determining the number of possible splits when a variable has many categories?

A

𝟐^(𝒏−𝟏)-1

55
Q

In the recursive partitioning step of decision tree construction, what is the first step regarding predictor variables?

A

Pick one of the predictor variables, x

56
Q

How is a value (si) selected for a chosen predictor variable in the recursive partitioning step?

A

si that divides the training data into two (not necessarily equal) portions

57
Q

Define the concept of “purity” in the context of the recursive partitioning step.

A

containing records of mostly one class

58
Q

What is the objective of the algorithm when trying different values of xi and si in the recursive partitioning step?

A

Algorithm tries different values of xi, and si to
maximize purity in initial split

59
Q

After obtaining a maximum purity split, what is the next step in the recursive partitioning process?

A

repeat the process for a second split, and so on

60
Q

What are the conditions for stopping the partitioning process in decision tree construction?

A
  • There are no samples left.
  • There are no remaining attributes for further partitioning
  • a stopping criterion is used
61
Q

Why is a stopping criterion necessary in decision tree construction, especially with real-world data?

A

Many large sets of real-world data are noisy,
making it difficult to obtain pure data sets at leaf
nodes

Example for this stopping criterion is to set a measure of data purity to be smaller than a threshold value, e.g., entropy < 0.1

62
Q

Describe the principle of the decision tree construction algorithm

A

Basic algorithm (adopted by ID3, C4.5 and CART): a greedy
algorithm. Tree is constructed in a top-down recursive divide-and-conquer manner

63
Q

What happens in each iteration of the decision tree construction algorithm, and how are test attributes selected?

A
  • At start, all the training tuples are at the root
  • Tuples are partitioned recursively based on selected attributes
  • Test attributes are selected on the basis of a heuristic or statistical measure (e.g, information gain)
64
Q

What are the stopping conditions for constructing a decision tree node, and how does the algorithm handle them?

A
  • All samples for a given node belong to the same class
  • There are no remaining attributes for further partitioning majority voting is employed for classifying the leaf
  • There are no samples left
65
Q

What is the purpose of rule extraction from a decision tree?

A

Rules are easier to understand than large
trees

66
Q

What forms a conjunction along a path from the root to a leaf in a decision tree?

A

Each attribute-value pair along a path forms a conjunction: the leaf holds the class prediction

67
Q

In the example provided, what is the class prediction in the rule “IF age = young AND student = no THEN buys_computer = no”?

A

buys_computer = no

68
Q

What is the significance of the leaf node in the rule extraction from a decision tree?

A

holds the class prediction

69
Q

What is the key consideration in building a decision tree regarding attribute selection?

A

The key to building a decision tree - which attribute to
choose in order to branch.

70
Q

What is the primary objective when choosing an attribute for branching in a decision tree?

A

The objective is to reduce impurity or uncertainty in
data as much as possible

71
Q

What does a measure of impurity prefer in terms of attributes?

A

to have a high degree of purity

72
Q

Define maximum purity and minimum purity in the context of impurity measures.

A
  • Maximum purity: All examples are of the same class
  • Minimum purity : All classes are equally likely
73
Q

Name three measures used for evaluating impurity in decision trees.

A

Measures for impurity
1. Entropy,
2. Information Gain
3. Gini Index and Other measures

74
Q

What does entropy measure in a dataset?

A

Entropy measures the degree of randomness or uncertainty in the dataset.

75
Q

How is entropy related to the distribution of class labels in classifications?

A

In classifications, entropy measures randomness based on the distribution of class labels in the dataset.

76
Q

Define the entropy (Hᵢ) for a subset of the dataset with K classes at the ith node.

A

Hᵢ = - Σₖ₌₁ᵏ₌ₙ pᵢ(k) * log₂(pᵢ(k)), where pᵢ(k) is the probability of class k in the subset.

77
Q

When does entropy reach its lowest value, and what does it indicate?

A

Entropy is 0 when the dataset is completely homogeneous, indicating that each instance belongs to the same class.

78
Q

When does entropy reach its maximum value, and what does it indicate?

A

Entropy is at its maximum when the dataset is equally divided between multiple classes, indicating maximum uncertainty in the dataset.

79
Q

How is entropy used to evaluate the quality of a split in a decision tree?

A

Entropy is used to select the attribute that minimizes the entropy of resulting subsets, aiming to create more homogeneous subsets with respect to class labels.

80
Q

What is the goal of entropy in decision tree construction?

A

The goal is to choose the attribute with the highest information gain, i.e., the attribute that minimizes entropy after splitting, and to build a decision tree recursively.

81
Q

Equation for Entopy derivation

A

E(S) = Σᶦ pᵢ * log₂(pᵢ)

82
Q

Equation for Entopy derivation for multiple attributes

A

E(T, X) = Σᶜ P(c) * E(c)

83
Q

What is Information Gain, and how is it used in decision tree algorithms?

A

Information Gain is a measure based on Claude Shannon’s information theory, assessing the reduction in entropy or variance resulting from splitting a dataset. In decision trees, it guides attribute selection by favoring the attribute that maximizes Information Gain, indicating its usefulness in creating homogeneous subsets with respect to class labels or target variables. Higher Information Gain signifies greater predictive value.

The attribute age has the highest information gain and therefore becomes the splitting attribute at the root node of the decision tree. Branches are growing through each outcome of age. The tuples are shown partitioned accordingly.

84
Q

How is Information Gain computed for the attribute “age” in a decision tree?

A

Information Gain for “age” is calculated by evaluating the expected information requirement. This involves examining the distribution of “yes” and “no” tuples for each age category. The formula includes the entropy calculation for each category and yields the Information Gain. In the provided example, the Information Gain for “age” is determined as 0.246 bits.

entropy calculation for each category
Inf o age (D)= 5 14 *(- 2 5 log 2 2 5 - 3 5 log 2 3 5 )+ 4/14 * (- 4/4 * log_2(4/4)) + 5/14 * (- 3/5 * log_2(3/5) - 2/5 * log_2(2/5)) = 0.694 bits

85
Q

How are Information Gains computed for attributes “income,” “student,” and “credit rating” in a decision tree?

A

Information Gains for “income,” “student,” and “credit rating” are computed using a similar process as for “age.” The gains are determined by evaluating the expected information requirement for each attribute. In this case, the computed gains are 0.029 bits for “income,” 0.151 bits for “student,” and 0.048 bits for “credit rating.” Despite having the highest gain among the attributes, “age” is selected as the splitting attribute for Node N in the decision tree.

86
Q

What is the formula for the Gini index, and how is it used in the context of a decision tree?

A

The Gini index is given by the formula: \( Gini(D) = 1 - \sum_{i=1}^{m} p_i^2 \), where \( p_i \) is the probability that a tuple in \( D \) belongs to class \( C_i \). The index measures the impurity of a data partition or set of training tuples. When considering a binary split for each attribute, the Gini index for a partitioning \( Gin*i_{A}(D) \) is calculated as a weighted sum of the impurity of each resulting partition. For a discrete-valued attribute, the subset that gives the minimum Gini index is selected as its splitting subset.

87
Q

How is the Gini index used to induce a decision tree, and what is the process of finding the splitting criterion for the tuples in ( D )?

A

To induce a decision tree, the Gini index is computed for each attribute. The process involves considering each possible binary split for a discrete-valued attribute. The splitting criterion for the tuples in ( D ) is determined by selecting the subset that gives the minimum Gini index for that attribute. The weighted sum of impurity for each resulting partition is used to evaluate the Gini index for the binary split on an attribute.

88
Q

In the given example (Example 8.3), how is the Gini index computed for the attribute “income,” considering the subset (low, medium)?

A

The Gini index for the subset (low, medium) is computed using the formula ( Ginlincome \in {low, medium} (D) = 10/14 * Gini(D_{1}) + 4/14 * Gini(D_{2}) ), where ( D_{1} ) and ( D_{2} ) are partitions resulting from the binary split based on the condition “income € (low, medium).” The Gini index values for ( D_{1} ) and ( D_{2} ) are calculated, and the weighted sum is used to determine the Gini index for the binary split on the “income” attribute. In this example, the resulting Gini index is 0.443.

89
Q

What is the overall process of finding the splitting criterion and inducing a decision tree using the Gini index in the provided example?

A

The overall process involves computing the Gini index for each attribute, considering all possible binary splits. For each attribute, the subset that minimizes the Gini index is selected as the splitting subset. The Gini index values for the selected subsets are used to determine the best attribute for the root node. In the provided example, the process starts with the attribute “income,” and the Gini index is computed for subsets like (low, medium). The same process is then repeated for other attributes, and the attribute with the minimum Gini index becomes the splitting attribute for the root node. The decision tree is grown recursively based on these splitting criteria.

90
Q

What are the Gini index values for splits on the subsets {low, high}, {medium}, and {medium, high} for the attribute “income”?

A

The Gini index values for splits on the subsets are as follows:
{low, high} and {medium}: 0.458
{medium, high} and {low}: 0.450
Therefore, the best binary split for the “income” attribute is on {low, medium} (or {high}) as it minimizes the Gini index.

91
Q

What is the best binary split for the attribute “age,” and what is the corresponding Gini index?

A

The best binary split for the “age” attribute is on {youth, senior} (or {middle aged}), with a Gini index of 0.375.

92
Q

Are the attributes “student” and “credit rating” binary, and what are their respective Gini index values?

A

Yes, both “student” and “credit rating” are binary attributes. The Gini index values are 0.367 for “student” and 0.429 for “credit rating.”

93
Q

Which attribute and splitting subset give the minimum Gini index overall, and what is the reduction in impurity?

A

The attribute “age” and splitting subset {youth, senior} give the minimum Gini index overall, with a reduction in impurity of
0.459−0.357 = 0.102
0.459−0.357=0.102.

94
Q

How is the final splitting criterion determined, and what is done with it in the context of building the decision tree?

A

The final splitting criterion is determined by selecting the attribute and its corresponding splitting subset that result in the minimum Gini index. In this example, the binary split “age ∈ {youth, senior?}” yields the maximum reduction in impurity and is returned as the splitting criterion. Node N is labeled with this criterion, two branches are grown from it, and the tuples are partitioned accordingly during the construction of the decision tree.

95
Q

What is pruning in decision trees?

A

Pruning by definition is basically eliminating the subtrees and replacing them with leaf node

96
Q

How does pruning improve the performance of the tree?

A

When a decision tree is built, many of the branches will reflect anomalies in the training data due to noise or outliers

Tree pruning methods address this problem of
overfitting the data

97
Q

How does pruning improve the performance of the tree?

A

reducing its size and removing the parts of the tree that do not provide power to classify instances.

98
Q

What problem do tree pruning methods address?

A

The complexity of the tree, reduces overfitting, and increases its predictive power.

99
Q

Out of all of the machine learning algorithms, which are the most susceptible to overfitting?

A

Decision trees

100
Q

What are the two common approaches to tree pruning?

A

prepruning and postpruning

101
Q

What is prepruning?

A

In prepruning, a decision tree is halted while growing so that it won’t get too complex.

102
Q

What is postpruning?

A

The tree is grown till its fullest and then pruned following a bottom up or a top down strategy.

103
Q

Pre-Pruning & Post-Pruning

Which method is considered more interesting and why?

A

Pre-Pruning because it would save time since no time would be wasted growing subtrees that will be eliminated further on.

104
Q

What is the main idea behind the prepruning approach in decision tree algorithms?

A

The main idea behind the prepruning approach is that trees are not pruned in prepruning algorithms; instead, the algorithms are halted based on some stopping criterion. This criterion is often related to the goodness of the split, which is determined by metrics such as Information Gain, Gini Index, Gain Ratio, etc. If the information measured at a test node falls below a predefined threshold, the branching on that path is halted.

105
Q

How is the decision to halt the branching determined in prepruning algorithms?

A

The decision to halt the branching is determined based on the goodness of the split. If the information measured at a test node is below a specified threshold, the branching is stopped on that path.

106
Q

What are some common stopping criteria used in prepruning algorithms?

A

Common stopping criteria include:
* Information Gain below a threshold
* Gini Index below a threshold
* Gain Ratio below a threshold
* Limiting tree size
* Limiting instances in an internal node
* Halt if class distribution of instances is independent of the available feature

107
Q

What is the role of threshold values in prepruning, and how are they used in determining when to stop branching?

A

Threshold values play a crucial role in prepruning, as they define the conditions for stopping the branching process. If certain metrics (e.g., Information Gain, Gini Index) fall below the specified thresholds, the tree-growing process is halted on that path. Similarly, tree size and instances in internal nodes can be limited by threshold values.

108
Q

What is the main goal of the prepruning approach in decision tree construction?

A

The main goal of the prepruning approach is to “prune” the tree by halting its construction early, thus preventing further splitting or partitioning of the subset of training tuples at a given node.

109
Q

How does a node in the prepruning approach become a leaf in the decision tree?

A

Upon halting the construction at a node in the prepruning approach, that node becomes a leaf. The leaf may hold either the most frequent class among the subset tuples or the probability distribution of those tuples.

110
Q

What measures are commonly used to assess the goodness of a split in the prepruning approach?

A

Measures such as statistical significance, information gain, Gini index, and similar metrics are commonly used to assess the goodness of a split in the prepruning approach.

111
Q

How is the decision to halt further partitioning determined in the prepruning approach?

A

If partitioning the tuples at a node would result in a split that falls below a pre-specified threshold (e.g., in terms of information gain, Gini index), further partitioning of the given subset is halted.

112
Q

What challenges or difficulties are associated with choosing an appropriate threshold in the prepruning approach?

A

Choosing an appropriate threshold in the prepruning approach is challenging. High thresholds may lead to oversimplified trees, while low thresholds could result in very little simplification. Striking the right balance is crucial for achieving an optimal level of simplification without sacrificing the tree’s predictive capabilities.

113
Q

How does postpruning differ from prepruning in terms of restrictions?

A

Postpruning is not restricted by predefined thresholds, unlike prepruning, which relies on stopping criteria based on specific thresholds.

114
Q

Describe the process of subtree pruning in postpruning.

A

In postpruning, a subtree at a given node is pruned by removing its branches and replacing it with a leaf. The leaf is then labeled with the most frequent class among the subtree being replaced.

115
Q

What is the potential impact on accuracy when performing subtree pruning in postpruning?

A

Pruning a subtree in postpruning might lower the accuracy in the training data; however, it is expected to increase the accuracy in the test data.

116
Q

In terms of efficiency and accuracy, how do prepruning and postpruning compare?

A

Prepruning is considered more efficient as it halts tree growth early, producing trees faster. On the other hand, postpruning tends to provide better accuracy overall, according to most studies, despite being potentially less efficient.

117
Q

What are the challenges associated with pruned decision trees?

A

Pruned decision trees, although more compact than their unpruned counterparts, may still be large and complex, leading to challenges in interpretation.

118
Q

Explain the concepts of repetition and replication in decision trees.

A

Repetition occurs when an attribute is repeatedly tested along a given branch of the decision tree. Replication refers to the existence of duplicate subtrees within the tree. Both repetition and replication can make decision trees overwhelming to interpret

119
Q

How can the issues of repetition and replication in decision trees be addressed?

A

The issues of repetition and replication can be addressed by using multivariate splits (splits based on a combination of attributes). Another approach is to use a different form of knowledge representation, such as rules, instead of decision trees.

120
Q

How does the goal of a regression tree differ from that of a classification tree?

A

The goal of a regression tree is regression, focusing on predicting continuous values instead of class labels. Unlike classification trees, which aim to assign class labels, regression trees predict continuous values for the resulting leaf nodes.

121
Q

What impurity measure is used in regression trees, and why?

A

In regression trees, mean squared error is used as the impurity measure instead of entropy or similar measures. Mean squared error is more suitable for regression tasks where the goal is to minimize the difference between predicted and actual continuous values.

122
Q

How are leaf nodes generated in a regression tree, and what information do they represent?

A

Leaf nodes in a regression tree are generated by taking an average over the distributed target values of the path that is taken after all the branching is done until that leaf node. These leaf nodes represent the predicted continuous values.

123
Q

Why is the resulting tree in a regression tree binary?

A

The resulting tree in a regression tree is binary because the nodes are always branched into two partitions: one with values greater than or equal to a specified value and another with values less than the specified value.

124
Q

What is the core principle of the Greedy method in decision-making?

A

The core principle of the Greedy method is to make locally-optimal choices at each step, hoping that these choices will lead to a globally-optimal solution. It focuses on making the best decision at the current moment without considering the long-term impact on future decisions.

125
Q

How does the Greedy method differ from considering the broader problem in decision-making?

A

The Greedy method makes decisions based on the information available at each phase without considering the broader problem. It focuses on local optimum choices at each stage, and there is a possibility that the greedy solution may not provide the best solution for every problem.

126
Q

How does the Greedy algorithm make decisions in the hope of finding the optimal solution?

A

The Greedy algorithm makes good local choices at each stage with the intention of finding the global optimum. It follows a strategy of making decisions based on the information available at each phase, aiming for the solution to be either feasible or optimal.

127
Q

What is the main aim of prepruning in Decision Tree learning?
(multiple answer)
A. To improve the training speed.
B. To improve the testing speed.
C. To reduce the memory requirement.
D. A and C

A

D

128
Q

True/False

In a decision tree, the more levels the final tree has,
the more accurate the prediction becomes. No
exception.

A

False

129
Q

Suppose we have a nominal attribute X with 4 values.
In decision tree learning algorithm, how many binary
split values are checked for that attribute?

A

In general, if we have a variable with n possible values
there are 2^(n−1)-1 possible binary splits. In the above
example 4 values means 2^(4−1)-1=7 possible splits.

130
Q

Learning decision tree is a greedy algorithm what this
mean and is there any problem with such algorithm?

A

It means that locally optimal decisions are made at
each node. Such algorithms cannot guarantee to return
the globally optimal decision tree.