Week 3 Flashcards

1
Q

When is a model underfitting?

A

When training and cross validation error are both high.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When is a model overfitting?

A

Training error is low, cross validation error is high.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is bias?

A

Being wrong.
Does not capture relationship between feature variables and outcome variable.
Predictions are consistent but poor model choices lead to wrong predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is variance?

A

Being unstable.
Model identifies relationship between the features and outcome variable perfectly
Model incorporates random noise besides the underlying function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is reducible error?

A

Unavoidable randomness.
Real world data will always contain some randomness in the data points.
Find a model that finds the relationship and avoids incorporating random noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What causes high bias?

A

Model misrepresenting the data given missing information
An overly simple model (bias to the simplicity of the model)

ASSOCIATED WITH UNDERFITTING

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What causes high variance?

A

Due to overly complex or poorly fit models (e.g. polynomial order 14 model)

ASSOCIATED WITH OVERFITTING

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is reducible error?

A
  • Tendency to intrinsic uncertainty or randomness.
  • It is impossible to perfectly model the majority of real world data. Thus, we have to be comfortable
    with some measure of error.
  • The error is present even in the best model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Summary of bias-variance tradeoff

A

Model adjustments that decrease bias often increase variance and vice versa
Similar to a complexity tradeoff
Choosing the right level of complexity
Want a model complex enough to not underfit, but not so complex that it overfits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is linear model regularisation(or shrinkage)

A

Adds an adjustable regularisation strength parameter directly into cost function
Adds a penalty proportional to the size of the estimated model parameter
When it is large, stronger parameters are penalised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does more regularisation do

A

Introduces a simpler model or more bias
Less makes it more complex and increases variance.

If the model overfits (variance is too high), regularisation can improve generalisation error and reduce
variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two regularisation methods

A

L1 (LASSO)
L2 (Ridge)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Ridge and what does it do

A

Imposes bias on the model and reduces variance by applying the penalty proportionally to the squared coefficient values. The best value for lambda can be selected by cross validation. (features should be scaled)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Lasso?

A

The penalty is applied proportionally to absolute coefficient values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does regularisation perform feature selection?

A

It drives some coefficients towards zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 4 proprocessing steps

A

Data integration
Data cleaning
Data transformation
Data visualisation

17
Q

What is data integration

A

Putting all the data in the same place
(physically: all in one computer, all in compatible data structures)

18
Q

What is data cleaning?

A

Dealing with incorrect, incomplete, outlying, irrelevant or missing parts of the data

19
Q

What is data transformation?

A

Transforming your data into what you will actually use (feature extraction, scaling etc)

20
Q

Name 3 types of data transformation

A

Standardisation
Normalisation
Log scale