decision trees Flashcards
why do people use decision trees
simpliscity and interperability
extendability
what is the predictive model structure of a decision learning tree
root or internal node: a feature
leaf node : target value
branch : represents a decision rule
what does a root or internal node represent in a decision learning tree
root or internal node: a feature
what does a leaf node represent in a decision learning tree
leaf node : target value
what does a branch represent in a decision learning tree
branch : represents a decision rule
what are the types of decision trees
classification : target variable takes categorical data
regression : target variable takes continuous data
what are the difference between classification and regression trees
classification : target variable takes categorical data
regression : target variable takes continuous data
what are classification trees
DTs where target variable takes categorical data
what are regression trees
DTs where target variable takes continuous data
whgat does information gain for DTs do
measures reduction in entropy after splitting a dataset based on a feature (not features)
what is the equation for information gain
I(Y,X) = H(Y) - H(Y|X)
what does each part of the equation stand for in information gain
I(Y,X) = H(Y) - H(Y|X)
Y - random variable representing target
X - random variable representing feature of input sample
H(Y) - entropy of y
H(X|Y) - conditional entropy of y given x
what does information gain do
quantifies improvement of classifying labels after using a feature to split the dataset
feature that maxmises information gain is chosen for the split