9. A Protocol in the Era of ML Flashcards

1
Q
  1. Motivation
A
  1. Inputs are guided by a reasonable hypothesis
  2. Hypothesis before looking at data!

(Important bc large out-of-the-sample tests not possible in ML)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Multiple Testing and statistical methods
A
  1. Keep track of tried strategies
  2. Beware of parallel universe problem

( 2-sigma is not necessarily meaningful, and a strategy working by chance is not necessarily significant )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Sample choice and data
A
  1. Test sample defined and justified ex-Ante
    –> Justify in advance and never change sample
  2. Winsorization (truncation at certain threshhold) defined and justified ex-Ante
  3. Pre-processing to ensure data quality
  4. Pre-processing choices regarding Data transformation or Outlier exclusion need to be justified and documented.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Cross-validation
A
  1. Aknowledge that no true out-of-sample data exists

( in Trading only out-of-sample data is live trading data )

  1. Iterative out-of-sample == overfitting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Model dynamics
A
  1. Beware of structural changes over time
  2. NO tweaking –> overfitting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Model Complexity
A
  1. Beware of dimensionality curse
    ( more predictor variables == more data needed )
  2. Aim for simplicity and Regularization and interpretability ( ML application should not be a black box )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. Research Culture
A
  1. Establish a Research Culture That Rewards Quality Science
  2. Be Careful with Delegated Research (assistants have incentive to support supervisor hypothesis)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Protocol Summary

A
  1. Motivation
  2. Multiple Testing and statistic methods
  3. Sample choice and data
  4. Cross-validation
  5. Model dynamics
  6. Model Complexity
  7. Research Culture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly