MODULE 2 S3.1 Flashcards

Decision Tree

1
Q

They are widely used for classification and regression tasks.

A

Decision Trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Decision trees learn a hierarchy of _________ questions, leading to a ____________.

A

if/else
decision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Learning a decision tree means learning the sequence of if/else questions that gets us to the __________ answer most quickly.

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In the machine learning setting, questions are called _________

A

tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F To build a tree, the algorithm searches over all possible tests and finds the one that is most informative about the target variable

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The top node, which represents the whole dataset

A

root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Decision tree classes

A

DecisionTreeRegressor
DecisionTreeClassifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

We can visualize the tree using the ____________ function from the tree module.

A

export_graphiz

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

It is a text file format for storing graphs.

A

.dot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

It is a diagram or chart that people use to determine a course of action or show a statistical probability.

A

Decision tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

____________ : feature (attribute)
____________ : decision (rule) or reaction
____________ : outcome

A

node
branch
leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Types of Decision Tree in Machine Learning

A

Classification trees
Regression trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Decision Variables

Classification : _____________
Regression : ______________

A

categorical
continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The topmost node of a decision tree that represents the entire message or decision.

A

Root node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The process of dividing a bode into two or more nodes. It’s the part at which the decision branches off into variables.

A

Splitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A node within a decision tree where the prior nose branches into two or more variables.

A

Decision (internal) node

17
Q

Also called as the external or terminal node, It is the last node in the tree and furthest from the root node.

A

Leaf (terminal) node

18
Q

Paths that connect the nodes and represent the different possible outcomes of the test.

A

Branch

19
Q

Nodes that precede other nodes in the tree hierarchy.

A

Parent node

20
Q

Nodes directly connected to the parent node, resulting from the split or decision made at the parent node.

A

Child node

21
Q

The opposite of splitting, the process of going through and reducing the tree to only the most important nodes or outcomes.

A

Pruning

22
Q

Decision Tree Algorithms

A

ID3 (Iterative Dichotomiser 3)
C4.5
CART (Classification and Regression Tree)
CHAID (Chi-square Automatic Interaction Detection)
MARS (Multivariate Adaptive Regression Splines)

23
Q

This algorithm uses the information gain metric to determine the best feature to split on at each node.

A

ID3 (Iterative Dichotomiser 3)

24
Q

T/F ID3 is prone to underfit.

A

False

25
Q

ID3 is prone to _________ and can create __________.

A

overfitting
huge trees

26
Q

Algorithm that continues splitting until all instances are perfectly classified or no further useful features are available.

A

ID3

27
Q

successor of ID3

A

C4.5

28
Q

Algorithm that uses the gain ratio metric instead of information gain to account for the number of branches in a feature.

A

C4.5

29
Q

Algorithm that can be used for classification and regression.

A

CART (Classification and Regression Tree)

30
Q

CART

classification : _____________
regression : ______________

A

Gini impurity
mean squared error

31
Q

Algorithm that uses chi-squared tests to find the best split.

A

CHAID (Chi-square Automatic Interaction Detection)

32
Q

Algorithm typically used for categorical variables and can handle multiway splits. It performs multi-level splits when computing classification trees.

A

CHAID

33
Q

Algorithm primarily used for regression. It builds models by fitting piecewise linear regressions and combining them into a single model.

A

MARS

34
Q

T/F MARS is capable of modeling complex, nonlinear relationships, and interactions between featers.

A

True