Decision Trees and Overfitting Flashcards
What is:
A hyperplane?
A hyperplane is a multidimentional decision boundary in an instance space, which is imposed by a particular node in a corresponding decision tree. This node causes the boundary to ‘split’ the instance space in as many pieces as the ‘children nodes’ of that parent node.
For a two dimentional space, it is a vertical or horizontal line perpendicular to the axis with the variable corresponidng to the node.
For a three-dimentional space, it is a two dimentional plane, since one variable is kept constant, while the others can move around.
Thus, a problem of n variables, causes each node to have an n-1-dimentional “hyperplane” decision boundary
What is:
A node?
A node is a part of a decision tree, that can either be an interior node or a terminal node.
Interior nodes contain a ‘test’ of a certain attribute/feature/variable, from which the ‘branches’ of the node contain one particular value. The terminal nodes, or leaf nodes contain the categories that data instances are divided into after going through the tree.
Each data instance corresponds to one leaf node and one leaf node only.
What is:
The Laplace-correction
The Laplace correction is a method where a frequency-based estimate of class membership probability is “smoothed” by adding 1 to the numerator and 2 to the denominator, hence making sure that pure leaf nodes with extremely few data instances don’t have an extremely high probability score of belonging to a certain class, despite having much less evidence then leaf notes with more data instances.
What is:
Entropy?
Entropy is a measure of disorder, or in data mining, a measure of impurity. Applied in supervised segmentation, it is a measure of how impure a segment/node is with respect to the value of the target variable.
High entropy is when there is a segment that has a lot of data instances with different categories.
What is:
Information gain?
Information gain a splitting criterion, and is the proportion with which the entropy changes after adding more information to the model.
In supervised segmentation, is measures how much purer the children nodes are than the parent node after splitting the set in the parent node on all values of a single attribute/feature/variable.
What is:
A linear discriminant function?
A linear discriminant function is a function that uses a decision boundary to calculate the likelihood scores of instances to fall into certain categories, base on an attribute of interest. This gives us a ranking of likelihood scores rather than an exact probability scores for each data instance of belonging to a category.
What is:
Hinge loss?
Hinge loss is a loss function that penalizes examples of data instances that are on the wrong side of the margin (*) in Support Vector Machines. The penalty for being beyond the margin increases linearly as the example is further away from the boundary.
What is:
Zero-one loss?
Zero-one loss is a loss function that penalizes examples of data instances that are on the wrong side of the margin (*) in Support Vector Machines. It applies a penalty of 1 to all the incorrectly placed examples and a penalty of 0 for all those that are correct.
What are:
Support Vector Machines?
Support Vector Machines are a type of linear class probability estimation model that makes use of a decision boundary with a margin to distinguish between data instances of a different class of a particular target feature.
What is:
A logistic regression model?
A logistic regression model is not -despite its name- a regression model, rather it is a class probability estimation model that estimates the log-odds (thus odds, thus probability) of an example data instance belonging to a categorical target variable.
What is:
Pruning?
Pruning is a tree induction technique for tree models where an extremely latrge tree model is created from which we trace back the nodes to a smaller model.
What is:
Base rate?
The base error rate is the percentage of new cases that a model would predict wrongly if it were to always assign the majority class to those new cases. A classifier that does this is called a base rate classifier.
In case of overfitting a model, the training dataset will always be predicted more accurately, while the holdout-/test set will not necessarily do so.
What is:
Cross-Validation?
What is:
A Learning Curve?
A Learning Curve is a curve that shows the generalization performance on testing data, plotted against the amount of training data used in the building of the model.
What is:
Tree Stopping?