Cross Validation Flashcards

1
Q
  • What cross in cross validation mean
A
  • Learning from training data to map across to test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  • 2 ways dataset could be split into train and test sets
A

either part of the domain (space or time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  • Pseudoprospective vs prospective forecasting
A
  • Pseudo – Mimics a prospective but uses historical data from past events (data not trained on)
  • Prospective – collect data in present time and use that as testing data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  • Why hindcasts are unreliable in measuring predictive skill of model?
A
  • May have data leakage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • How out-of-sample testing assist with hyperparameter selection
A

which hyperparameter is better at generalising

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  • What features of data create challenges for model generalization
A
  • Imbalanced data
  • Outliers
  • noise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  • How to watch out for information leakage
A
  • Perform data cleaning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  • What features of models create challenges for their generalization?
A
  • A model that is too good fit to training
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  • Why pipeline is good for describing ML workflow
A
  • It is a procedure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  • In pipeline, what steps can bias enter into the analysis?
A
  • Every step
How well did you know this?
1
Not at all
2
3
4
5
Perfectly