Random Forest Flashcards
Inductive Learning
also known as discovery learning, is a process where the learner discovers rules by observing examples. This is different from deductive learning, where students are given rules that they then need to apply.
Decision Tree Strucutre
Consists of root notes (where the tree starts)
Branches (splits with children)
Leaf nodes (end of the tree - represents possible outcomes)
nodes {where a parent and child meet}
Experience Table
A labeled data set with your target variable and all of the features for which data was collected
What kind of algorithm will we use for our decision trees?
ID3
Decision Tree Algorithm
(1) Choose the best attribute to split the remaining instances - that becomes the root
(2) repeat process with children
(3) stop when - all instances have the same target attribute value, there are no more attributes, or there are no more instances
How do you identify the best attribute to become the root of your decision tree?
Information gain
What makes a good decision tree?
It must be small AND classify accurately
small trees are less susceptible to overfitting and are easier to understand
Information Gain and Impurity Levels
{xxxxxyxxxxyxxx} not pure
{xxxxxxxxxxxxxx} as pure as it gets
{xxxxxxxyyyyyyyy} least pure
Information Gain
We want to determine which attribute in a given set of training feature vectors is most useful for discriminating between classes to be learned.
Information gain tells us how important a given attribute of the feature vectors is
We use it to decide the order of attributes in the nodes of a decision tree
Decision Tree CONS
suffer from a problem of errors propagating throughout the tree (becomes more of an issue as number of classes increases)
Error Propogation
Since decision trees work by a series of local decisions, what happens when one of these local decisions is wrong? Everything beyond that point is incorrect, and we may never return to the right path
Noisy data in decision trees
When 2 values have the same attribute / values pairs but different classifications
some values of the attributes are incorrect because of errors in the data acquisition process or the preprocessing phase
Some attributes may be irrelevant to the decision making process (the color of a dice used to roll)
Overfitting in Decision Trees
Irrelevant attributes can VERY EASILIY lead to overfitting
Too little training data can also lead to overfitting
How to avoid overfitting in Decision Trees
Stop growing the tree when the data split is not statistically significant
Acquire more training data
Remove irrelevant attributes
Grow a full tree then post - prune
How to select the best decision tree
Measure performance over training data
Measure performance over separate validation sets
Add complexity penalty to performance measure