Week 3 Flashcards

Question 1

Q

When is a model underfitting?

Answer

A

When training and cross validation error are both high.

Question 2

Q

When is a model overfitting?

Answer

A

Training error is low, cross validation error is high.

Question 3

Q

What is bias?

Answer

A

Being wrong.
Does not capture relationship between feature variables and outcome variable.
Predictions are consistent but poor model choices lead to wrong predictions

Question 4

Q

What is variance?

Answer

A

Being unstable.
Model identifies relationship between the features and outcome variable perfectly
Model incorporates random noise besides the underlying function

Question 5

Q

What is reducible error?

Answer

A

Unavoidable randomness.
Real world data will always contain some randomness in the data points.
Find a model that finds the relationship and avoids incorporating random noise

Question 6

Q

What causes high bias?

Answer

A

Model misrepresenting the data given missing information
An overly simple model (bias to the simplicity of the model)

ASSOCIATED WITH UNDERFITTING

Question 7

Q

What causes high variance?

Answer

A

Due to overly complex or poorly fit models (e.g. polynomial order 14 model)

ASSOCIATED WITH OVERFITTING

Question 8

Q

What is reducible error?

Answer

A

Tendency to intrinsic uncertainty or randomness.
It is impossible to perfectly model the majority of real world data. Thus, we have to be comfortable
with some measure of error.
The error is present even in the best model.

Question 9

Q

Summary of bias-variance tradeoff

Answer

A

Model adjustments that decrease bias often increase variance and vice versa
Similar to a complexity tradeoff
Choosing the right level of complexity
Want a model complex enough to not underfit, but not so complex that it overfits.

Question 10

Q

What is linear model regularisation(or shrinkage)

Answer

A

Adds an adjustable regularisation strength parameter directly into cost function
Adds a penalty proportional to the size of the estimated model parameter
When it is large, stronger parameters are penalised.

Question 11

Q

What does more regularisation do

Answer

A

Introduces a simpler model or more bias
Less makes it more complex and increases variance.

If the model overfits (variance is too high), regularisation can improve generalisation error and reduce
variance.

Question 12

Q

What are the two regularisation methods

Answer

A

L1 (LASSO)
L2 (Ridge)

Question 13

Q

What is Ridge and what does it do

Answer

A

Imposes bias on the model and reduces variance by applying the penalty proportionally to the squared coefficient values. The best value for lambda can be selected by cross validation. (features should be scaled)

Question 14

Q

What is Lasso?

Answer

A

The penalty is applied proportionally to absolute coefficient values.

Question 15

Q

How does regularisation perform feature selection?

Answer

A

It drives some coefficients towards zero.

Question 16

Q

What are the 4 proprocessing steps

Answer

Study These Flashcards

A

Data integration
Data cleaning
Data transformation
Data visualisation

Question 17

Q

What is data integration

Answer

Study These Flashcards

A

Putting all the data in the same place
(physically: all in one computer, all in compatible data structures)

Question 18

Q

What is data cleaning?

Answer

Study These Flashcards

A

Dealing with incorrect, incomplete, outlying, irrelevant or missing parts of the data

Question 19

Q

What is data transformation?

Answer

Study These Flashcards

A

Transforming your data into what you will actually use (feature extraction, scaling etc)

Question 20

Q

Name 3 types of data transformation

Answer

Study These Flashcards

A

Standardisation
Normalisation
Log scale

Week 3 Flashcards

(20 cards)