ML - Over/Under fitting, Bias and Variance Flashcards

Question 1

Q

What are the two major sources of error in ML?

Answer

A

Bias - An algorithms error rate on the training set.

Variance - How much worse a algorithm does on the test set than the training set

Question 2

Q

Explain Bias

Answer

A

Overall inaccuracy of the model caused by erroneous assumptions which occurred during training.

Question 3

Q

Why should we aim to reduce Bias?

Answer

A

Try to reduce bias because it leads to a situation where the algorithm that is building the model fails to capture relationships between the features and the ideal output - Underfitting

Question 4

Q

Explain Variance

Answer

A

Variance is the error we get as a result of sensitivity to small, unrepresentative, fluctuations in the training data set (NOISE). Variance describes the case where random fluctuations in the test data become part of the model

Question 5

Q

Why should we aim to reduce Variance?

Answer

A

Try to reduce Variance because the algorithm is failing to generalise from the train to the test set - possible overfitting.

Question 6

Q

What is Overfitting?

Answer

A

Overfitting occurs when a model has captured some of the random ‘noise’ in the data as well as (or instead of) the ‘real’ underlying relationships.
As a model becomes more complex, the danger of overfitting increases.

Question 7

Q

What is Underfitting?

Answer

A

Underfitting if algorithm too insensitive and overlooks underlying patterns. Likely to neglect significant trends, and causes model to yield less accurate predictions for current and future data.

Question 8

Q

As complexity increases does Bias decrease/Variance increase or Bias increase/Variance decrease?

Answer

A

As a general rule, as the complexity of the model increases, the bias decreases, however the variance increases.
There is a sweet spot.

Question 9

Q

If your training set has an error rate of 15% and you require it to be5%, should you add more training data?

Answer

A

No. Adding more data by itself will not make the rate better. You should focus on other changes.

Question 10

Q

Suppose your algorithm has error rates as follows:

Training error = 1%
Test error = 11%

Answer

A

Overfitting

Low Bias = 1%
High Variance = 11%

Question 11

Q

Suppose your algorithm has error rates as follows:

Training error = 15%
Test error = 16%

Answer

A

Underfitting

High Bias = 15%
Low Variance = 1%

Question 12

Q

Suppose your algorithm has error rates as follows:

Training error = 15%
Test error = 30%

Answer

A

Both Overfitting and Underfitting - hard to classify type of error

High Bias = 15%
High Variance = 15%

Question 13

Q

Suppose your algorithm has error rates as follows:

Training error = 0.5%
Test error = 1%

Answer

A

It is doing very well, as low for both Bias and Variance.

Question 14

Q

What is the Optimal Error Rate?

Answer

A

This is unavoidable Bias. It refers to the error rate that is inherent in something we are trying to model and even the best algorithm in the world could not get an error rate lower than this.

Question 15

Q

How do you address Bias?

Answer

A

Increase the size of your model

Question 16

Q

How do you address Variance?

Answer

Study These Flashcards

A

Add data to your training set

ML - Over/Under fitting, Bias and Variance Flashcards

(16 cards)