Session 4 Flashcards
A fitting graph shows
the accuracy (or error rate) of a model as a function of model complexity
Fitting graph
Generally, there will be more overfitting as…
one allows the model to be more complex
Complexity is a measure of
the flexibility of a model
If the model is a mathematical function, complexity is measured by
the number of parameters
If the model is a tree, complexity is measured by
the number of nodes
To look at overfitting, you only look at the
holdout data
Generally: A procedure that grows trees until the leaves are pure tends to overfit
- If allowed to grow without bound, decision trees can fit any data to arbitrary precision
- The complexity of a tree lies in the number of nodes
Ways to avoid overfitting for tree induction
- Stop growing the tree before it gets too complex
- Prune back a tree that is too large (reduce its size)
Tuning model parameters
When choices all are made, then test on test set this “nested” test set is often called the
“validation” set (to differentiate from the final test set).
Cross-validation
the dataset was first shuffled, then divided into ten partitions
How to find which variables are the most important for the model?
There are multiple ways of determining how important is a variable:
- (Weighted) Sum of information gain in each split a variable is used (tree-based models)
- Difference in model performance with and without using that variable (all models)
“Random Forest”
is a tree-based model that uses multiple decision trees simultaneously
A “Random Forest” model is equivalent to a decision tree if:
- We set the parameter “number of trees” to 1; and
- We set the parameter “subset ratio” to 1
Rapidminer has an operator called “Weight by Tree Importance” that calculates the variable importance based on information gain (among others)
This operator only works with models of type “Random Forest”
A learning curve is…
a plot of the generalization performance (testing data) against the amount of training data
Learning curve
Generalization performance improves as…
more training data are available
Steep initially, but then marginal advantage of more data decreases
The ROC graph shows
the entire space of performance possibilities for a given model, independent of class balance