Decision Trees, Boosting, SVMs Flashcards

1
Q

Gini Impurity

A

Most common loss function used in classification trees within random forests. Gini impurity measures the likelihood of incorrectly classifying a random chosen element if it was randomly labeled according to the distribution of labels in the node. The goal is to minimize Gini impurity at each split, leading to more homogeneous nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Hinge Loss

A

Most common loss function for SVMs for classification. The hinge loss penalizes predictions that are not only wrong but also not far enough on the correct side of the decision boundary (i.e within the margin). This encourages the model to create a larger margin between classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Pruning

A
  • When branches that have weak predictive power are removed in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model.
  • Can happen bottoms up, or top down, with approaches such as reduced error pruning and cost complexity pruning.
  • Reduced error is perhaps the simplest but also is optimized for maximum accuracy
    • Replace each node, if it doesn’t decrease predictive accuracy, keep it pruned.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a decision tree?

A

A tree structure where internal nodes represent feature tests, branches represent outcomes, and leaf nodes represent decision or classifications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is information gain?

A

The reduction in entropy after a dataset is split on an attribute. It’s used to build decision trees.

Entropy can be thought of as how much variance the data has. For example, a dataset of only blues would have very low entropy, while a dataset of mixed blues, greens, and reds would have relatively high entropy. High entropy means more uncertainty, while low entropy means more predictability.
Information gain is a measure of how much information a feature provides about a class. It’s calculated using entropy and is used to determine which feature should be used to split the data at each internal node of the decision tree. The greater the information gain, the greater the decrease in entropy or uncertainty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a random forest?

A

An ensemble of decision trees where each tree is built on a random subset of the data and features. Predictions are made by averaging or voting over the trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is bagging?

A

Bootstrap Aggregating or bagging is a method that involves training multiple models on different subsets of the training data and combining their predictions to improve accuracy.

This is what a random forest is, good for high variance, low bias issues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is boosting?

A

A technique that combines weak learners (usually decision trees) sequentially, with each learner correcting errors of its predecessors.

This is like XGBoost, good for high bias, low variance situations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is AdaBoost?

A

A boosting technique that adjusts the weights of incorrectly classified instances, so subsequent models focus on those harder cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is gradient boosting?

A

A boosting technique where new models are trained to predict the residual errors of the existing model in a gradient descent manner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is XGBoost

A

An optimized implementation of gradient boosting that is efficient and widely used in machine learning competitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is LightGBM?

A

A gradient boosting framework that uses tree-based learning algorithms and is designed for speed and efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a kernel in SVMs?

A

A function that allows SVM to work in high-dimensional spaces by mapping the input space into a dimensional feature space.

Hence, Kernel Trick

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the margin in SVM?

A

The distance between the hyperplane and the nearest data points from both classes. SVMs aim to maximize this margin. (this separation between classes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is soft margin in SVM?

A

A concept in SVMs where some misclassifications are allowed in order to balance the tradeoff between margin maximization and classification accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between bagging and boosting?

A

Bagging trains models independently and aggregates them, while boosting trains models sequentially, with each one focusing on the errors of the previous model.

16
Q

What is feature importance in decision trees?

A

A measure of the contribution of each feature to the model’s predictions, often based on Gini impurity reduction or information gain.

17
Q

What is the out-of-bag error in random forests?

A

The error rate estimated using the samples that were not included in the bootstrap samples (out-of-bag data) used to train individual trees.

18
Q

What is the CART algorithm?

A

Classification and Regression Trees (CART) is a decision tree algorithm that splits data into subsets based on the value of input features. CART picks the separation with the lowest impurity score.

19
Q

What is early stopping in gradient boosting?

A

A technique used to stop training when the performance on a validation set stops improving to avoid overfitting.