Decision Trees and Overfitting Flashcards

1
Q

What is:

A hyperplane?

A

A hyperplane is a multidimentional decision boundary in an instance space, which is imposed by a particular node in a corresponding decision tree. This node causes the boundary to ‘split’ the instance space in as many pieces as the ‘children nodes’ of that parent node.

For a two dimentional space, it is a vertical or horizontal line perpendicular to the axis with the variable corresponidng to the node.

For a three-dimentional space, it is a two dimentional plane, since one variable is kept constant, while the others can move around.

Thus, a problem of n variables, causes each node to have an n-1-dimentional “hyperplane” decision boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is:

A node?

A

A node is a part of a decision tree, that can either be an interior node or a terminal node.
Interior nodes contain a ‘test’ of a certain attribute/feature/variable, from which the ‘branches’ of the node contain one particular value. The terminal nodes, or leaf nodes contain the categories that data instances are divided into after going through the tree.
Each data instance corresponds to one leaf node and one leaf node only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is:

The Laplace-correction

A

The Laplace correction is a method where a frequency-based estimate of class membership probability is “smoothed” by adding 1 to the numerator and 2 to the denominator, hence making sure that pure leaf nodes with extremely few data instances don’t have an extremely high probability score of belonging to a certain class, despite having much less evidence then leaf notes with more data instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is:

Entropy?

A

Entropy is a measure of disorder, or in data mining, a measure of impurity. Applied in supervised segmentation, it is a measure of how impure a segment/node is with respect to the value of the target variable.

High entropy is when there is a segment that has a lot of data instances with different categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is:

Information gain?

A

Information gain a splitting criterion, and is the proportion with which the entropy changes after adding more information to the model.

In supervised segmentation, is measures how much purer the children nodes are than the parent node after splitting the set in the parent node on all values of a single attribute/feature/variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is:

A linear discriminant function?

A

A linear discriminant function is a function that uses a decision boundary to calculate the likelihood scores of instances to fall into certain categories, base on an attribute of interest. This gives us a ranking of likelihood scores rather than an exact probability scores for each data instance of belonging to a category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is:

Hinge loss?

A

Hinge loss is a loss function that penalizes examples of data instances that are on the wrong side of the margin (*) in Support Vector Machines. The penalty for being beyond the margin increases linearly as the example is further away from the boundary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is:

Zero-one loss?

A

Zero-one loss is a loss function that penalizes examples of data instances that are on the wrong side of the margin (*) in Support Vector Machines. It applies a penalty of 1 to all the incorrectly placed examples and a penalty of 0 for all those that are correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are:

Support Vector Machines?

A

Support Vector Machines are a type of linear class probability estimation model that makes use of a decision boundary with a margin to distinguish between data instances of a different class of a particular target feature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is:

A logistic regression model?

A

A logistic regression model is not -despite its name- a regression model, rather it is a class probability estimation model that estimates the log-odds (thus odds, thus probability) of an example data instance belonging to a categorical target variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is:

Pruning?

A

Pruning is a tree induction technique for tree models where an extremely latrge tree model is created from which we trace back the nodes to a smaller model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is:

Base rate?

A

The base error rate is the percentage of new cases that a model would predict wrongly if it were to always assign the majority class to those new cases. A classifier that does this is called a base rate classifier.

In case of overfitting a model, the training dataset will always be predicted more accurately, while the holdout-/test set will not necessarily do so.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is:

Cross-Validation?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is:

A Learning Curve?

A

A Learning Curve is a curve that shows the generalization performance on testing data, plotted against the amount of training data used in the building of the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is:

Tree Stopping?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is:

Recursive partitioning?

A

Recursive partitioning is the process that is used for decision tree models and consists of finding the one most informative variable/feature firstly. Then splitting the data into subsamples, on which the same process is repeated until the prediction is sufficiently strong.