Trees Flashcards
what is a main issue with logsitic regression
coeffcicent indicaates effect of variable not how decesion is made
decesions we make in real life are:
in sequential order like regression trees
main benefit of regrssion trees
is easy to understand
in trees we know directrion the variable effects the probability but cant tell
impact on y variable
we split the data using the IV into
yes or no decisions
trees doesnt assumes model is
linear
adding more splits will
increase accuracy
3 splits in the tree
3 decision tree levels
terminal node is an
output not condition ex- will tell you color red or grey
root node is
condition very correlated with y variable - most important one at top
we can have 100% accuracy if keep adding splits with no errors but issue is
too many variables and leads to overfitting
what is fix to overfitting issue
set lower bond on number of points in each subset
each split divides points into
buckets
if we set minimum bucket size = lower bound thenn we
wont split if points in the split is less than minimum bucket size
buckets only tell you
SIZE OF SPLit not the outcome of the bucket