MODULE 2 S3.1 Flashcards
Decision Tree
They are widely used for classification and regression tasks.
Decision Trees
Decision trees learn a hierarchy of _________ questions, leading to a ____________.
if/else
decision
Learning a decision tree means learning the sequence of if/else questions that gets us to the __________ answer most quickly.
true
In the machine learning setting, questions are called _________
tests
T/F To build a tree, the algorithm searches over all possible tests and finds the one that is most informative about the target variable
True
The top node, which represents the whole dataset
root
Decision tree classes
DecisionTreeRegressor
DecisionTreeClassifier
We can visualize the tree using the ____________ function from the tree module.
export_graphiz
It is a text file format for storing graphs.
.dot
It is a diagram or chart that people use to determine a course of action or show a statistical probability.
Decision tree
____________ : feature (attribute)
____________ : decision (rule) or reaction
____________ : outcome
node
branch
leaf
Types of Decision Tree in Machine Learning
Classification trees
Regression trees
Decision Variables
Classification : _____________
Regression : ______________
categorical
continuous
The topmost node of a decision tree that represents the entire message or decision.
Root node
The process of dividing a bode into two or more nodes. It’s the part at which the decision branches off into variables.
Splitting
A node within a decision tree where the prior nose branches into two or more variables.
Decision (internal) node
Also called as the external or terminal node, It is the last node in the tree and furthest from the root node.
Leaf (terminal) node
Paths that connect the nodes and represent the different possible outcomes of the test.
Branch
Nodes that precede other nodes in the tree hierarchy.
Parent node
Nodes directly connected to the parent node, resulting from the split or decision made at the parent node.
Child node
The opposite of splitting, the process of going through and reducing the tree to only the most important nodes or outcomes.
Pruning
Decision Tree Algorithms
ID3 (Iterative Dichotomiser 3)
C4.5
CART (Classification and Regression Tree)
CHAID (Chi-square Automatic Interaction Detection)
MARS (Multivariate Adaptive Regression Splines)
This algorithm uses the information gain metric to determine the best feature to split on at each node.
ID3 (Iterative Dichotomiser 3)
T/F ID3 is prone to underfit.
False
ID3 is prone to _________ and can create __________.
overfitting
huge trees
Algorithm that continues splitting until all instances are perfectly classified or no further useful features are available.
ID3
ID3 was developed by _________________ in ______
Ross Quinlan
1986
successor of ID3
C4.5
Who developed C4.5?
Ross Quinlan
Algorithm that uses the gain ratio metric instead of information gain to account for the number of branches in a feature.
C4.5
It handles both categorixal and continuous data and prunes trees to avoid overfitting, which makes it better at handling noisy data.
C4.5
Algorithm that can be used for classification and regression problems and uses Gini impurity or mean squared error.
CART (Classification and Regression Tree)
CART was developed by
Leo Breiman
Jerome Friedman
Richard Olshen
Charles Stone
CART uses :
classification : _____________
regression : ______________
Gini impurity
mean squared error
It provides clear and interpretable models, and trees are pruned to prevent overfitting.
CART
Algorithm that uses chi-squared tests to find the best split.
CHAID (Chi-square Automatic Interaction Detection)
Algorithm that is typically used for categorical variables and can handle multiway splits. It performs multi-level splits when computing classification trees.
CHAID
CHAID was developed by ____________
Gordon Kass
Algorithm that is primarily used for regression. It builds models by fitting piecewise linear regressions and combining them into a single model.
MARS (Multivariate Adaptive Regression Splines)
T/F MARS is capable of modeling complex, nonlinear relationships, and interactions between featers.
True
MARS was developed by ______________
Jerome Friedman
Full form of ID3
Iterative Dichotomiser 3
Full form of CART
Classification and Regression Tree
Full form of CHAID
Chi-square Automatic Interaction Detection
Full form of MARS
Multivariate Adaptive Regression Splines
T/F CART is an n-ary tree
FALSE
Binary tree