Week 8 - ensemble and AutoML Flashcards
why can ensemble methods be useful?
- sometimes we don’t know if complex non-linear models will do best, or if simple linear models will do best
- They can protect against overfitting
What is a deicison tree?
A tree with nodes that makes a yes/no decision based on a specific yes no question
what is gini impurity in the context of decision tree
If we randomly pick a datapoint in our dataset and classify it based in the class distribution of the dataset, what is the probability that this is incorrect:
The gini impurity, for any split, shows the proportion of data belonging to each class, according to each option of the ‘question’ or ‘split;
To calculate the gini impurity, you first iterate over classes. Then you get the proportion of data that belongs to each class. Then you multiply this by 1 - the proportion
For a node, a lower Gini impurity indicates that the node contains mostly elements from a single class (more pure), while a higher Gini impurity indicates a more mixed node (less pure).
We can combine the gini impurities from the nodes of each feature to calculate the gini impurity of each feature.
We decide to split our tree where the value of gini impurity is lowest. So the first split in the decision tree represents the feature with the lowest gini impurity.
How does gini impurity work with continuous variables?
You try out every possible split/threshold and calculate the gini for it
The one with the lowest gini determines our split
what is a problem with decision trees?
Decision trees can be very large and very complex
As a result, they can be very prone to overfitting
Deep trees can be very problematic as they make predictions based on very specific combinations of features. You can mitigate this by limiting max_depth.
what is the random sample method of making random forent models from decision trees?
You randomly sample the data (bootsrapping) and then fit a decision tree to each random sample of the data
This will give us three different decision trees that will make slightly different predictions cus their trained on slightly different types of data
this can help protect against overfitting
What is the random feature method of making random forest from decision trees?
You take random samples of the features in the data
you then train a model on different samples of the features to make an ensemble model
How might a random forest make a prediction?
It might use a majority voting system
how can you assess feature importance?
You can look at average location (towards the root) of each feature on the trees .
This is because features with high Gini impurity tend to lie near the top of the trees, becasue they tend to be most powerful in predicting the outcome
what is bagging vs. boosting of ensembles?
BAGGING
- Combining random versions of many strong classifiers
- e.g random forests of full length decision trees
- this also works for regression problems
BOOSTING
- Combining many weak classifiers to make a small one
- This often uses short decision ‘stumps’ adaptively, so that each new ‘stump’ it adds is chosen adaptively
- E.g LGBM
bagging takes a random selection of strong classifiers, whereas boosting chooses the new classifiers in intelligent ways
why can decision trees outperform neural networks?
They are able to model very detailed non linear boundaries that MLP’s don’t
what is an example of autoML
predicting brain age from cortical anatomy