Decision Trees and Overfitting Flashcards

Question 1

Q

What is:

A hyperplane?

Answer

A

A hyperplane is a multidimentional decision boundary in an instance space, which is imposed by a particular node in a corresponding decision tree. This node causes the boundary to ‘split’ the instance space in as many pieces as the ‘children nodes’ of that parent node.

For a two dimentional space, it is a vertical or horizontal line perpendicular to the axis with the variable corresponidng to the node.

For a three-dimentional space, it is a two dimentional plane, since one variable is kept constant, while the others can move around.

Thus, a problem of n variables, causes each node to have an n-1-dimentional “hyperplane” decision boundary

Question 2

Q

What is:

A node?

Answer

A

A node is a part of a decision tree, that can either be an interior node or a terminal node.
Interior nodes contain a ‘test’ of a certain attribute/feature/variable, from which the ‘branches’ of the node contain one particular value. The terminal nodes, or leaf nodes contain the categories that data instances are divided into after going through the tree.
Each data instance corresponds to one leaf node and one leaf node only.

Question 3

Q

What is:

The Laplace-correction

Answer

A

The Laplace correction is a method where a frequency-based estimate of class membership probability is “smoothed” by adding 1 to the numerator and 2 to the denominator, hence making sure that pure leaf nodes with extremely few data instances don’t have an extremely high probability score of belonging to a certain class, despite having much less evidence then leaf notes with more data instances.

Question 4

Q

What is:

Entropy?

Answer

A

Entropy is a measure of disorder, or in data mining, a measure of impurity. Applied in supervised segmentation, it is a measure of how impure a segment/node is with respect to the value of the target variable.

High entropy is when there is a segment that has a lot of data instances with different categories.

Question 5

Q

What is:

Information gain?

Answer

A

Information gain a splitting criterion, and is the proportion with which the entropy changes after adding more information to the model.

In supervised segmentation, is measures how much purer the children nodes are than the parent node after splitting the set in the parent node on all values of a single attribute/feature/variable.

Question 6

Q

What is:

A linear discriminant function?

Answer

A

A linear discriminant function is a function that uses a decision boundary to calculate the likelihood scores of instances to fall into certain categories, base on an attribute of interest. This gives us a ranking of likelihood scores rather than an exact probability scores for each data instance of belonging to a category.

Question 7

Q

What is:

Hinge loss?

Answer

A

Hinge loss is a loss function that penalizes examples of data instances that are on the wrong side of the margin (*) in Support Vector Machines. The penalty for being beyond the margin increases linearly as the example is further away from the boundary.

Question 8

Q

What is:

Zero-one loss?

Answer

A

Zero-one loss is a loss function that penalizes examples of data instances that are on the wrong side of the margin (*) in Support Vector Machines. It applies a penalty of 1 to all the incorrectly placed examples and a penalty of 0 for all those that are correct.

Question 9

Q

What are:

Support Vector Machines?

Answer

A

Support Vector Machines are a type of linear class probability estimation model that makes use of a decision boundary with a margin to distinguish between data instances of a different class of a particular target feature.

Question 10

Q

What is:

A logistic regression model?

Answer

A

A logistic regression model is not -despite its name- a regression model, rather it is a class probability estimation model that estimates the log-odds (thus odds, thus probability) of an example data instance belonging to a categorical target variable.

Question 11

Q

What is:

Pruning?

Answer

A

Pruning is a tree induction technique for tree models where an extremely latrge tree model is created from which we trace back the nodes to a smaller model.

Question 12

Q

What is:

Base rate?

Answer

A

The base error rate is the percentage of new cases that a model would predict wrongly if it were to always assign the majority class to those new cases. A classifier that does this is called a base rate classifier.

In case of overfitting a model, the training dataset will always be predicted more accurately, while the holdout-/test set will not necessarily do so.

Question 13

Q

What is:

Cross-Validation?

Question 14

Q

What is:

A Learning Curve?

Answer

A

A Learning Curve is a curve that shows the generalization performance on testing data, plotted against the amount of training data used in the building of the model.

Question 15

Q

What is:

Tree Stopping?

Question 16

Q

What is:

Recursive partitioning?

Answer

Study These Flashcards

A

Recursive partitioning is the process that is used for decision tree models and consists of finding the one most informative variable/feature firstly. Then splitting the data into subsamples, on which the same process is repeated until the prediction is sufficiently strong.

Decision Trees and Overfitting Flashcards

(16 cards)