Decision Trees Flashcards
What are categorial and what are numeric attributes?
numeric (e.g., number of legs) and categorical (e.g., delicious/not delicious)
classification trees and
regression trees difference?
cl: which produce categorical outputs
produce numeric outputs
What is entropy?
Entropy represents the uncertainty of the data.
If all the data points belong to a single class, then there is no real uncertainty=low entropy. If the data points are evenly spread across the classes, there is a lot of uncertainty=high entropy
What do we have to take care of regarding entropy if we partition data?
partition needs low entropy if it splits the data into subsets that themselves have low entropy (i.e., are highly certain), and high entropy if it contains subsets that (are large and) have high entropy (i.e., are highly uncertain)
What do decision nodes and leaf nodes do?
decision nodes lead us through the tree and leaf nodes are the ends which give us predictions.
What can random forests do?
They can aviod overfitting by building multiple decision trees and let them decide how to classify.