7 - Validating and Evaluating Data Models Flashcards

1
Q

How can forecasts be evaluated?

A
  • compare the forecast to what actually happened

- compare to naive forecast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can forecasts be evaluated?

Compare the forecast to what actually happened

A
  • distance between observations and forecast should be minimal
  • fit can change if the forecasted market changes
  • careful: self-fulfilling prophecies can lead to a perfect fit that is still not helpful
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can forecasts be evaluated?

Compare to naive forecast

A
  • naive forecast: assume what happened in the previous period will happen in this period
  • any more complex forecast should be better that the naive forecast, otherwise why pay for a complex method?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Measuring forecast performance

Error measures for numerical values

A

Absolut: RMSE (root mean squared error)
-> depends on scale

Percentage: MAPE (mean absolute percentage error)
-> independent of scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Self-fulfilling forecasts can have bad consequences

Example

A
  • a firm offers several products at different prices
  • customers always buy the cheapest product, substituting higher-priced products
  • the firm sells fewer expensive products than expected
  • the forecast predicts little demand for expensive products
  • the firm stocks more cheap products
  • profit spirals down
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Measuring classification performance

Error measures for categorical values

A
  • error rate per category
  • error rate across categories
  • comparing error rates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Measuring classification performance

Error measures for categorical values

Error rate per category

A

Recall = no. of instances correctly assigned to class / no. of instances that are actually in class
(starts from the true assignment)

Precision = no. of instances that are actually in class / no. of instances assigned to class
(starts from the predicted assignment)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measuring classification performance

Error measures for categorical values

Error rate across categories

A
  • average or weighted average

- weighted according to exogenous or endogenous importance of a class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Measuring classification performance

Error measures for categorical values

Comparing error rates

A
  • error on training set vs. validation set vs. test set

- expected error (probability) vs. observed error vs. error from benchmark approaches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Measuring classification performance

Benchmarking

Possible benchmarks

A
  • statistically expected error rate - probabilistic distribution of instances
  • naive rules
  • expert assignment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measuring classification performance

Benchmarking

Benchmark factors beyond accuracy

A
  • effort - computational, financial, …
  • reliability - over time, data sets, …
  • acceptance - who gets to overwrite?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Cross-Validation and Bootstrapping

Splitting the data set for evaluation

Example: Decision tree

A

Training set: build the tree

Validation set: prune the tree

Test set: evaluate the tree’s predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Cross-Validation and Bootstrapping

Splitting the data set for evaluation

A

Split the data set:

  • training set
  • validation set
  • test set

Build the model:
- Training set
- Validation set
(both overlapping)

Evaluate the model:
- test set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cross-Validation and Bootstrapping

Hold out 1: k-fold Cross validation

A

Split the data set into k partitions of equal size

  • use k-1 partitions for training (and validation)
  • use the k-th partition for evaluation (“hold-out”)
  • common: k=10

Repeat the cross validation k times, where the hold-out partition alternates across all k partitions

Average the result over the k repetitions for a single measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cross-Validation and Bootstrapping

Hold out 2: Bootstrap

A
  • alternative to cross validation, applicable for small data sets
  • n = size of the original data set
  • draw n instances with replacement from the data set to generate a training set
  • > drawing with replacement: the same instance can be included multiple times, others are ignored
  • use the instances that were never drawn for the test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What’s a lift factor?

A
  • describes the increase in response rate yielded by the learning tool
  • describes only the percentage increase, not an increase in absolute respondents
  • but assuming that any additional sample costs money, computing lift factors enables cost-benefit analyses
17
Q

Computing the lift factor for deterministic classification

Steps

A
  1. consider for all instances the prediction and the actual class
  2. compute the overall share of the desired class (e.g. “positive response to the newsletter”)
  3. compute the share of the desired class in those instances predicted to belong to the class
  4. lift factor = share within the in-class predicted instances/overall share
18
Q

Computing the lift factor for probabilistic classification

Steps

A
  1. consider for all instances the predicted class probability and the actual class
  2. order the instances by descending probability of belonging to the desired class (e.g. “positive response to newsletter”)
  3. select a sample size and select the corresponding number of instances from the top of the ordered list
  4. compute the share of the desired class in the selected instances
  5. lift factor = share within the sample / overall share
19
Q

Lift chart

A
  • lift charts can be computed when classification is probabilistic
  • compute the lift factor when increasing the sample size, possibly comparing to the increase in cost caused by increasing the sample size