Hypothesis Testing in AI Flashcards

1
Q

What are the 4 steps in hypothesis testing?

A
  1. State the hypothesis
  2. Design the statisitcal test
  3. Calculate the test statistic
  4. Reject or fail to reject the null hypothesis

State, Design,Calcuate, R or A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a test statstic?

A

It is a function of the observed values of the random variables of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant by the p-value?

A

The probability in the tail

The probabability of observing a sample estimate as extreme as the one observed, assuming the null hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is meant by the level of significance?

A

The probability of making a Type I error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Type I error?

What is Type II error?

A

There are 2 types of errors,

Type I: Wrongly rejecting a true null hypothesis

Type II; Accept (Fail to reject) an unture null hypothesis.

The probability of a Type I error is the level of significance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the reasons for unrepresentative datasets?

A
  1. Selection bias (leaving certain obervations out);
  2. Survivorship bias (leave out funds no longer in existence);
  3. Self selection bias (Fund Managers leave out certain funds)

The 3 S’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data mining

Data dredging (Data snooping)

A

Data mining; vigorously testing until valid relationships are established

Data dredging: Overuse of statistical tests or Running hundreds of statistical tests to identify significant relationship without regard to economic rationale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Backtesting, overfitting and Backfilling

A
  • Backtesting: Apply models on historical data to determine how well the model would have explained the actuals results: Good sometimes
    • Overfitting;When many parameters are used to a model in histotical data;Not good
    • Backfilling: Updating databases by inserting returns that pre-date the date of entry in the database. Backfill bias is also known as instant history bias. Generally not good
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cherry picking and chumming

A
  • Cherry picking ; Selectivley reporting results;
  • Chumming: If enough predictions are made, some will be correct
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why are cumulative return charts deceptive?

How is deception avoided?

A

Because of compounding effects. Gap widens with an early period advantage.

Deception is avoided by using cumulative log chart becase log returns are additive, not multiplicative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly