9. A Protocol in the Era of ML Flashcards
1
Q
- Motivation
A
- Inputs are guided by a reasonable hypothesis
- Hypothesis before looking at data!
(Important bc large out-of-the-sample tests not possible in ML)
2
Q
- Multiple Testing and statistical methods
A
- Keep track of tried strategies
- Beware of parallel universe problem
( 2-sigma is not necessarily meaningful, and a strategy working by chance is not necessarily significant )
3
Q
- Sample choice and data
A
- Test sample defined and justified ex-Ante
–> Justify in advance and never change sample - Winsorization (truncation at certain threshhold) defined and justified ex-Ante
- Pre-processing to ensure data quality
- Pre-processing choices regarding Data transformation or Outlier exclusion need to be justified and documented.
4
Q
- Cross-validation
A
- Aknowledge that no true out-of-sample data exists
( in Trading only out-of-sample data is live trading data )
- Iterative out-of-sample == overfitting
5
Q
- Model dynamics
A
- Beware of structural changes over time
- NO tweaking –> overfitting
6
Q
- Model Complexity
A
- Beware of dimensionality curse
( more predictor variables == more data needed ) - Aim for simplicity and Regularization and interpretability ( ML application should not be a black box )
7
Q
- Research Culture
A
- Establish a Research Culture That Rewards Quality Science
- Be Careful with Delegated Research (assistants have incentive to support supervisor hypothesis)
8
Q
Protocol Summary
A
- Motivation
- Multiple Testing and statistic methods
- Sample choice and data
- Cross-validation
- Model dynamics
- Model Complexity
- Research Culture