ML - Over/Under fitting, Bias and Variance Flashcards

1
Q

What are the two major sources of error in ML?

A

Bias - An algorithms error rate on the training set.

Variance - How much worse a algorithm does on the test set than the training set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain Bias

A

Overall inaccuracy of the model caused by erroneous assumptions which occurred during training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why should we aim to reduce Bias?

A

Try to reduce bias because it leads to a situation where the algorithm that is building the model fails to capture relationships between the features and the ideal output - Underfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain Variance

A

Variance is the error we get as a result of sensitivity to small, unrepresentative, fluctuations in the training data set (NOISE). Variance describes the case where random fluctuations in the test data become part of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why should we aim to reduce Variance?

A

Try to reduce Variance because the algorithm is failing to generalise from the train to the test set - possible overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Overfitting?

A

Overfitting occurs when a model has captured some of the random ‘noise’ in the data as well as (or instead of) the ‘real’ underlying relationships.
As a model becomes more complex, the danger of overfitting increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Underfitting?

A

Underfitting if algorithm too insensitive and overlooks underlying patterns. Likely to neglect significant trends, and causes model to yield less accurate predictions for current and future data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

As complexity increases does Bias decrease/Variance increase or Bias increase/Variance decrease?

A

As a general rule, as the complexity of the model increases, the bias decreases, however the variance increases.
There is a sweet spot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If your training set has an error rate of 15% and you require it to be5%, should you add more training data?

A

No. Adding more data by itself will not make the rate better. You should focus on other changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Suppose your algorithm has error rates as follows:

Training error = 1%
Test error = 11%

A

Overfitting

Low Bias = 1%
High Variance = 11%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Suppose your algorithm has error rates as follows:

Training error = 15%
Test error = 16%

A

Underfitting

High Bias = 15%
Low Variance = 1%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Suppose your algorithm has error rates as follows:

Training error = 15%
Test error = 30%

A

Both Overfitting and Underfitting - hard to classify type of error

High Bias = 15%
High Variance = 15%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Suppose your algorithm has error rates as follows:

Training error = 0.5%
Test error = 1%

A

It is doing very well, as low for both Bias and Variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Optimal Error Rate?

A

This is unavoidable Bias. It refers to the error rate that is inherent in something we are trying to model and even the best algorithm in the world could not get an error rate lower than this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you address Bias?

A

Increase the size of your model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you address Variance?

A

Add data to your training set