Decision Trees Flashcards

Question 1

Q

Decision Tree - decision making

Answer

A

Start with whole data and create all possible binary decisions based on each feature:
- discrete feature: is this class or no class?
- continuous feature: threshold < value or threshold > value
calculate the gini impurity for every decision
pick the decision which reduces the impurity the most

Question 2

Q

classification trees vs regression treees

Answer

A

classification: output are discrete. Leaf values are set to the most common outcomes
regression: output are numerical. Leaf values are set to the mean value in outcomes. Use MSE or RSS instead instead of gini

Question 3

Q

Decision tree - how to avoid overfitting

Answer

A

Prepruning (prune while you build the tree)

leaf size: stop splitting when examples get small enough
depth: stop splitting at a certain depth
purity: stop splitting if enough of examples are the same class
gain threshold: stop splitting when the information gain becomes too small

Postpruning (prune after you’ve finished building the tree)
- merge leaves if doing so decreases test-set error

Question 4

Q

ensemble methods

Answer

A

combining many weak models to form a strong model.

We train multiple models on the data, each model is different. They could be trained on different subsets of the data, or trained in different ways, or even be completely different types of models.

In order for ensemble to work, each model have to be capturing something new and different so they can add incremental insights

Question 5

Q

Decision Tree - bagging

Answer

A

creating each model from a bootstrap sample and aggregating the results. Can be used with any sort of model, but generally with decision trees

Question 6

Q

Random Forest

Answer

A

It takes bagging but doesn’t just bootstrap rows, but it also picks random set of features, and random features to split at. So some of these trees are split on more important features while others are forced to split on less important features

Question 7

Q

Random Forest - pros and cons

Answer

A

pros

no feature scaling needed
good performance
model nonlinear relationships

cons

can be expensive to train
not interpretable (no inference)

Question 8

Q

Gradient Boosting Regressor

Answer

A

Goal: to minimize sum of square errors.
start with the mean, subtract from y, then use the residual to build a tree. Outcome = residual, input = features with a learning rate. The learning rate is to slow down the reduction in residuals so we can be more precise in our prediction.

good for capturing non-linearity

Question 9

Q

Gradient Boosting Regressor Hyperparameters

Answer

A

loss - controls the loss function to minimize
n_estimators - how many decision trees to grow
learning_rate - start with 0.1 and go down
max_depth - how deep to grow each tree
subsample - similar to bagging in random forest. 1 = use 100% of data 0.5 = 50% etc

Question 10

Q

Gradient Boosting Classifier

Answer

A

Goal: minimize the residual between y and the probability of class y (aka predict_proba)

Question 11

Q

Optimization

Answer

A

Throughout machine learning we have a constant goal to find the model the best predicts the target from the features. We generally define best as minimizing some cost function or maximizing a score function.

Question 12

Q

derivative

Answer

A

slope of the line - when our graph is non-linear and we want to find out a the slope of a specific point on the non-linear graph, we can find the slope by calculating the derivative

Question 13

Q

gradient descend

Answer

A

gradient gives us the direction of the deepest decrease. Gradient descend is using gradient to point us to the direction, and continue to follow the decrease until we hit the bottom.

We can apply a learning rate to make the steps go smaller.

If our learning rate is low enough, gradient descend should lead us to the global minimum

Question 14

Q

Neural networks - forward propagation

Answer

A

We calculated the outcome based on features values and weights by passing through different layers to arrive at the outcome neuron

Question 15

Q

Neural networks - backward propagation

Answer

A

Moving from the end back to beginning. We need to find the optimal weight to minimize the error at the end by applying gradient descend

Question 16

Q

Epoch

Answer

Study These Flashcards

A

a loop of forward and backward propagation

Question 17

Q

Stochastic gradient descend

Answer

Study These Flashcards

A

generally drops the error faster than gradient descend

Question 18

Q

Neural networks - overfitting

Answer

Study These Flashcards

A

limit the number of hidden units
limit the norm of the weights
stop the learning before it has time to overfit
dropout - have a certain percentage of neurons fail i in each layer

Decision Trees Flashcards

(18 cards)