Cross Validation Flashcards
1
Q
- What cross in cross validation mean
A
- Learning from training data to map across to test
2
Q
- 2 ways dataset could be split into train and test sets
A
either part of the domain (space or time)
3
Q
- Pseudoprospective vs prospective forecasting
A
- Pseudo – Mimics a prospective but uses historical data from past events (data not trained on)
- Prospective – collect data in present time and use that as testing data
4
Q
- Why hindcasts are unreliable in measuring predictive skill of model?
A
- May have data leakage
5
Q
- How out-of-sample testing assist with hyperparameter selection
A
which hyperparameter is better at generalising
6
Q
- What features of data create challenges for model generalization
A
- Imbalanced data
- Outliers
- noise
7
Q
- How to watch out for information leakage
A
- Perform data cleaning
8
Q
- What features of models create challenges for their generalization?
A
- A model that is too good fit to training
9
Q
- Why pipeline is good for describing ML workflow
A
- It is a procedure
10
Q
- In pipeline, what steps can bias enter into the analysis?
A
- Every step