7 - Validating and Evaluating Data Models Flashcards

Question 1

Q

How can forecasts be evaluated?

Answer

A

compare the forecast to what actually happened

- compare to naive forecast

Question 2

Q

How can forecasts be evaluated?

Compare the forecast to what actually happened

Answer

A

distance between observations and forecast should be minimal
fit can change if the forecasted market changes
careful: self-fulfilling prophecies can lead to a perfect fit that is still not helpful

Question 3

Q

How can forecasts be evaluated?

Compare to naive forecast

Answer

A

naive forecast: assume what happened in the previous period will happen in this period
any more complex forecast should be better that the naive forecast, otherwise why pay for a complex method?

Question 4

Q

Measuring forecast performance

Error measures for numerical values

Answer

A

Absolut: RMSE (root mean squared error)
-> depends on scale

Percentage: MAPE (mean absolute percentage error)
-> independent of scale

Question 5

Q

Self-fulfilling forecasts can have bad consequences

Example

Answer

A

a firm offers several products at different prices
customers always buy the cheapest product, substituting higher-priced products
the firm sells fewer expensive products than expected
the forecast predicts little demand for expensive products
the firm stocks more cheap products
profit spirals down

Question 6

Q

Measuring classification performance

Error measures for categorical values

Answer

A

error rate per category
error rate across categories
comparing error rates

Question 7

Q

Measuring classification performance

Error measures for categorical values

Error rate per category

Answer

A

Recall = no. of instances correctly assigned to class / no. of instances that are actually in class
(starts from the true assignment)

Precision = no. of instances that are actually in class / no. of instances assigned to class
(starts from the predicted assignment)

Question 8

Q

Measuring classification performance

Error measures for categorical values

Error rate across categories

Answer

A

average or weighted average

- weighted according to exogenous or endogenous importance of a class

Question 9

Q

Measuring classification performance

Error measures for categorical values

Comparing error rates

Answer

A

error on training set vs. validation set vs. test set

- expected error (probability) vs. observed error vs. error from benchmark approaches

Question 10

Q

Measuring classification performance

Benchmarking

Possible benchmarks

Answer

A

statistically expected error rate - probabilistic distribution of instances
naive rules
expert assignment

Question 11

Q

Measuring classification performance

Benchmarking

Benchmark factors beyond accuracy

Answer

A

effort - computational, financial, …
reliability - over time, data sets, …
acceptance - who gets to overwrite?

Question 12

Q

Cross-Validation and Bootstrapping

Splitting the data set for evaluation

Example: Decision tree

Answer

A

Training set: build the tree

Validation set: prune the tree

Test set: evaluate the tree’s predictions

Question 13

Q

Cross-Validation and Bootstrapping

Splitting the data set for evaluation

Answer

A

Split the data set:

training set
validation set
test set

Build the model:
- Training set
- Validation set
(both overlapping)

Evaluate the model:
- test set

Question 14

Q

Cross-Validation and Bootstrapping

Hold out 1: k-fold Cross validation

Answer

A

Split the data set into k partitions of equal size

use k-1 partitions for training (and validation)
use the k-th partition for evaluation (“hold-out”)
common: k=10

Repeat the cross validation k times, where the hold-out partition alternates across all k partitions

Average the result over the k repetitions for a single measure

Question 15

Q

Cross-Validation and Bootstrapping

Hold out 2: Bootstrap

Answer

A

alternative to cross validation, applicable for small data sets
n = size of the original data set
draw n instances with replacement from the data set to generate a training set
> drawing with replacement: the same instance can be included multiple times, others are ignored
use the instances that were never drawn for the test

Question 16

Q

What’s a lift factor?

Answer

A

describes the increase in response rate yielded by the learning tool
describes only the percentage increase, not an increase in absolute respondents
but assuming that any additional sample costs money, computing lift factors enables cost-benefit analyses

Question 17

Q

Computing the lift factor for deterministic classification

Steps

Answer

A

consider for all instances the prediction and the actual class
compute the overall share of the desired class (e.g. “positive response to the newsletter”)
compute the share of the desired class in those instances predicted to belong to the class
lift factor = share within the in-class predicted instances/overall share

Question 18

Q

Computing the lift factor for probabilistic classification

Steps

Answer

A

consider for all instances the predicted class probability and the actual class
order the instances by descending probability of belonging to the desired class (e.g. “positive response to newsletter”)
select a sample size and select the corresponding number of instances from the top of the ordered list
compute the share of the desired class in the selected instances
lift factor = share within the sample / overall share

Question 19

Q

Lift chart

Answer

A

lift charts can be computed when classification is probabilistic
compute the lift factor when increasing the sample size, possibly comparing to the increase in cost caused by increasing the sample size