Session 4.1 Flashcards

1
Q

Bias-Variance tradeoff

When trying the optimal model we are in fact trying to find…

A

the optimal tradeoff between bias and variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bias-Variance tradeoff

We can reduce variance by

A

by putting many models together and aggregating their outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Bagging (or bootstrap aggregation) creates

A

multiple data sets from the original training data by bootstrapping – re-sample with repetition.

Runs several models and aggregates output with a voting system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Other ensemble methods

Random Forest

A

combines bagging with random selection of features (or predictors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Other ensemble methods

Boosting

A

applies classifiers sequentially, assigning higher weights to observations that have been mis-classified by the previous methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A table model

A

memorizes the training data and performs no generalization

Useless in practice! Previously unseen customers would all end up with
“0% likelihood of churning”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Generalization

A

is the property of a model or modeling process whereby
the model applies to data that were not used to build the model

If models do not generalize at all, they fit perfectly to the training data !
–> they overfit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Overfitting

A

is the tendency to tailor models to the training data, at the expense of generalization to previously unseen data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Holdout Validation

A
  1. Given only one data set, we split it into a a training set used for fitting the model and a test set used for evaluating the model
  2. Performance is evaluated based on accuracy in the test data a.k.a. “holdout accuracy”
  3. Holdout accuracy is an estimate of “generalization accuracy”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

As a model gets more complex, it is allowed to pick up harmful spurious correlations

A
  • These correlations do not represent characteristics of the population in general
  • They may become harmful when they produce incorrect generalizations in the model

This phenomenon is not particular to decision trees

  • It is also not because of atypical training data
  • There is no general analytic way to avoid overfitting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Simplest method to limit tree size:

A

specify a minimum number of instances that must be present in a leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Just as with trees, as you increase the dimensionality,

A

you can perfectly fit larger and larger sets of arbitrary points

  • Often, modelers manually prune the attributes in order to avoid overfitting
  • There are ways to select attributes automatically
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is overfitting bad?

A

A small imbalance in the training data can be ’learned’ by the tree and erroneously propagated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is the phenomenon of overfitting not particular to decision trees

A
  • It is also not because of atypical training data

- There is no general analytic way to avoid overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly