ML - Bias vs. Variance Flashcards

1
Q

Explain each component of this equation:

Y = f(X) + e

A

We know that we want to find a function f(X) to predict Y:

Y = f(X) + e

Where e is the prediction error term and it’s normally distributed with a mean of 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the expected squared error of the following equation?

Y=f(X) + e

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Decompose the following equation into its 3 main error components:

A

Thus, prediction error can be broken down into 3 parts:

  1. Bias Error
  2. Variance Error
  3. Irreducible Error = Noise in data = may be caused by unknown variables that influence the mapping of input variables to output variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Draw the picture with 4 targets and label each one with High / Low Bias and High / Low Variance…which target is considered underfitting / overfitting?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Draw 3 scatterplot diagrams.

Which one is overfitting / underfitting / just right?

Which one is high variance / high bias / low variance+bias?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Plot Error vs Model Complexity.

Draw 3 curves, one each representing Total Error / Variance / Bias2.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bias adds / subtracts terms from the model to make the target function easier to learn.

A

Bias are simplifying assumptions (subtracts terms) made by the model to make the target function easier to learn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bias makes models fast / slow.

A

Bias makes models fast (or simpler).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bias makes models more simple / complex.

A

Bias makes models more simple.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bias leads to overfitting / underfitting of the training data.

A

Bias leads to underfitting of the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bias leads to low / high error on the training + test data.

A

Bias leads to high error on the training + test data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bias can occur because of high / low # of parameters.

A

Bias can occur because of low # of parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Bias can occur because of high / low amount of training data.

A

Bias can occur because of low amount of training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Bias can occur because of fitting a ______ function to _____ data.

A

Bias can occur because of fitting a linear function to non-linear data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

More assumptions made about target function leads to high / low bias.

A

More assumptions made about target function leads to high bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Less assumptions made about target function leads to high / low bias.

A

Less assumptions made about target function leads to low bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Models with low bias are:

A

Models with low bias are:

  1. Decision Trees
  2. k-Nearest Neighbors
  3. SVMs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Models with high bias are:

A

Models with high bias are:

  1. Linear / Logistic Regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Variance?

A

Variance is the amount that the estimate of the target function will change if different training data are used. Ideally, we don’t want the target function to change too much from one training set to the next.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Models with high / low variance are strongly influenced by the specifics of the training data.

A

Models with high variance are strongly influenced by the specifics of the training data.

21
Q

Another word for high variance is _______ to the data.

A

Another word for high variance is sensitivity to the data.

22
Q

High variance models are flexible / rigid.

A

High variance models are flexible.

(many weights / parameters that get tuned when learning)

23
Q

High variance models are ________ to the training data.

A

High variance models are sensitive to the training data.

24
Q

Models with high variance underfit / overfit to the training data.

A

Models with high variance overfit to the training data.

25
Q

High variance models does / does not generalize well to unseen data.

A

High variance models does not generalize well to unseen data.

26
Q

High variance models do well / does not do well on:

  1. training data
  2. test data
A

High variance models:

  1. Does well on training data
  2. Does not do well on test data
27
Q

Changes to the training data cause big changes in the estimate of the target function are high / low variance models.

A

Changes to the training data cause big changes in the estimate of the target function are high variance models.

28
Q

Changes to the training data cause small changes in the estimate of the target function are high / low variance models.

A

Changes to the training data cause small changes in the estimate of the target function are low variance models.

29
Q

Examples of low variance models are:

A

Examples of low variance models are:

  1. Linear / Logistic Regression
  2. Naive Bayes
30
Q

Examples of high variance models are:

A

Examples of high variance models are:

  1. Decision Trees
  2. k-Nearest Neighbors
  3. SVMs
  4. NNs
31
Q

Does not generalize + model overfits to training data is…

high / low bias + high / low variance

A

Does not generalize + model overfits to training data is…

low bias + high variance

32
Q

Does not capture the true relationship between predictors and target variable + model is underfitting training data is…

high / low bias + high / low variance

A

Does not capture the true relationship between predictors and target variable + model is underfitting training data is…

high bias + low variance

33
Q

This picture represents…

high / low bias + high / low variance

A

low bias + high variance

34
Q

This picture represents…

high / low bias + high / low variance

A

This picture represents…

high bias + low variance

35
Q

Name 6 possible solutions to fix a low bias + high variance model:

A

Name 6 possible solutions to fix a low bias + high variance model:

  1. Choose Simpler Model
  2. Feature Selection (reduce # of features)
  3. Dimensionality Reduction
  4. Regularize (penalize model complexity)
  5. Bagging + Resampling techniques
  6. Training on larger dataset
36
Q

Name 2 possible solutions to fix a high bias + low variance model:

A

Name 2 possible solutions to fix a high bias + low variance model:

  1. Add more features
  2. Make model more flexible / sensitive
37
Q

What do we do near the end of our modeling step to check whether or not our model has high bias / variance?

A

We do k-fold Cross-Validation.

We do this to evaluate the performance of our model we need to test it on unseen data. This tells us how well our model generalizes and also whether we over / under-fit.

We do this by creating a train-test split of our data.

38
Q

Explain how picking the size of k in k-fold cross validation affects bias / variance.

A

k-Folds = k is our number of splits of training data, higher value of k leads to less bias / high variance, lower value of k leads to more bias / low variance.

In effect, making k small (k=1) turns the training data into the original training set and makes model too simple and underfit (ie - high bias, low variance).

Bumping up k, makes the model more sensitive to the underlying training set (ie - higher variance).

39
Q

Regression leads to a high / low variance model.

What can we do to this to fix it?

A

Regression leads to a high variance model.

Regression can be regularized to reduce model complexity. We do this by adding a penalty term for model complexity.

This reduces variance.

40
Q

Decision trees are usually a high / low variance model.

We can fix this by doing what?

A

Decision trees are usually a high variance model.

Decision Trees can be pruned to reduce model complexity. This reduces variance.

41
Q

k-NN models are usually high / low bias + high / variance models.

How can we fix this?

A

kNN = has low bias + high variance, but we can change this by increasing the value of k, which increases the number of neighbors that contribute to the prediction (which increases the bias)

  • large K = simple model = underfit = low variance & high bias
  • small K = complex model = overfit = high variance & low bias
42
Q

For k-NNs, when k increases to infinity, our model because more / less complex.

This leads to over / under -fitting.

This leads to high / low bias.

This leads to high / low variance.

A

For k-NNs, when k increases to infinity, our model because less complex.

This leads to under-fitting.

This leads to high bias.

This leads to low variance.

All test data point will belong to the same class: the majority class. If “granularity” is too fine, the result “outliers” and “noised” affect the decision process

43
Q

Bagging / Boosting is an ensemble method that aggregates models in parallel.

A

Bagging is an ensemble method that aggregates models in parallel.

44
Q

Bagging / Boosting is an ensemble method that aggregates models in sequential order.

A

Boosting is an ensemble method that aggregates models in sequential order.

45
Q

Bagging and Boosting increases / decreases the variance of your single estimate as they combine several estimates from different models.

So the result may be a model with higher / lower stability.

A

Bagging and Boosting decreases the variance of your single estimate as they combine several estimates from different models.

So the result may be a model with higher stability.

46
Q

Both bagging + boosting generate several training data sets by random sample, but only bagging / boosting determines the weights for the data to tip the scales in favor of the most difficult cases.

A

Both bagging + boosting generate several training data sets by random sample, but only boosting determines the weights for the data to tip the scales in favor of the most difficult cases.

47
Q

Both bagging + boosting make the final decision by averaging the N learners (or taking the majority of them), but it is an equally weighted average for bagging / boosting, and a weighted average for bagging / boosting (ie - more weight is given to those models with better performance on training data)

A

Both bagging + boosting make the final decision by averaging the N learners (or taking the majority of them), but it is an equally weighted average for bagging, and a weighted average for boosting (ie - more weight is given to those models with better performance on training data)

48
Q

Both bagging + boosting are good at reducting variance and providing higher stability, but only bagging / boosting tries to reduce bias.

A

Both bagging + boosting are good at reducting variance and providing higher stability, but only boosting tries to reduce bias.

49
Q

On the other hand, bagging / boosting may solve the over-fitting problem, while bagging / boosting can increase it.

A

On the other hand, bagging may solve the over-fitting problem, while boosting can increase it.