11 - Validating simulation models Flashcards

1
Q

Motivation: Why validate simulations?

A
  • when not rigorously calibrated and validated, simulations are neither a reliable research method nor a reliable tool for practical decision support
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition: Verification, Validation, Calibration
(Steps, that subsume validation)

Verification

A

Does the code do what the specification asks?

- code could include faults

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Definition: Verification, Validation, Calibration
(Steps, that subsume validation)

Structural Validation

A

Does the model correctly represent the problem space?

- are relevant relationships included in the model etc?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Definition: Verification, Validation, Calibration
(Steps, that subsume validation)

Input/Output validation

A

Does the simulation correctly represent the problem space?

  • some kind of description of the status quo based on data
  • parametrise the shape of the distribution

Calibration: adjusting input values to get valid output values
-> danger: overfitting the model when calibrating it

-> agent-based models are subject to over parametrization -> customer learning -> we can make assumptions but we don’t know the parameter for sure (no input parameters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Simulation Data

Different layers of data

A

Input data
Process data
Output (result) data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Simulation Data

Different layers of data

Input data

A
  • collected of inferred from empirical status quo

- e.g. customer arrival rate, range of products a company offers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Simulation Data

Different layers of data

Process data

A
  • generated during the simulation process, provides insights into the simulation model that are not empirically available or relevant for the simulation purpose
  • process or event logs from the real world can be compared
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Simulation Data

Different layers of data

Output (Results) data

A
  • indicators calculated for validation or what-if analysis, matching empirical indicators
  • in the status quo these are the indicators we are interested in -> what happens when we change them? Outcome?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Simulation data

Input data

A

Assumption:
- Structure of relevant simulation components and parameters has been determined

Direct observation:

  • for transparent systems
  • example: machine run times
  • or: what is the actual outline of the shop

Indirect inference:

  • based on empirical process and result indicators
  • example: customer choice
  • > we know what happened but not why, so we apply data analysis to find a model how things work in the real world
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Simulation data

Input data: scenarios

A

Stochastic scenarios
Worst- and best case scenarios
Qualitatively discrete scenarios

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Simulation data

Input data: scenarios

Stochastic scenarios

A
  • follow empirical distributions
  • stochastic input scenarios lead to stochastic process and result data
  • > when we are uncertain about data -> consider stochastic scenarios
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Simulation data

Input data: scenarios

Worst- and best-case scenarios

A
  • model extreme cases
  • results indicate the modeled systems’ robustness
  • > robustness = does it behave the same in each of these cases?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Simulation data

Input data: scenarios

Qualitatively discrete scenarios

A
  • model discrete alternative cases

- test robustness and the necessity of individualized strategies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Simulation data

Example: railway ticket as simulation input

A

Directly observable:

  • supply: products, capacity, price categories, availabilities
  • demand: historical sales

Indirect inference (also: estimation):

  • customer loyalty (e.g. we have the name on the ticket -> we can analyse how often they bought in the past)
  • reference prices
  • willingness to pay
  • > can use this to calibrate customers in a model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Simulation data

Input data on the future

A
  1. Insights:
    - analyzing empirical data to parametrize the simulation input data is based on insights, the pre-condition for predictive analytics
  2. Forecast:
    - to simulate future scenarios, the future values of input data have to be forecasted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Simulation data

Process data

A

Simulation systems are fully transparent - all data that is generated in the process of the simulation can be observed

We may look at: empirically-available process data

  • forecasts
  • plans
  • documented events

And we can compare it to: simulation-exclusive process data (= generate process data that wouldn’t have been availble in the real world):

  • decisions
  • learned experiences
  • communication
17
Q

Simulation data

Result data

A

Usually empirically-available

  • system reports
  • transaction data

Simulation-exclusive

  • long-term developments
  • what-if developments
  • internal agent- and system-states: e.g. customer satisfaction, emergent strategies
18
Q

Simulation dara

Result data

Usage

A
  • Output validation: compares result data to empirical data
  • sensitivity and meta-modeling: analyse relationship in the “black box”
  • data farming: simulations generate artificial transaction data -> data of the real world is not sufficient so we need more data
19
Q

Simulation data

Data farming

A

= simulation models run thousands of times can provide insights into the different consequences of different options

  • generates as much reproducible data as desired (amount is only limited to time and storage space)
  • success depends on validity - which is difficult to determine for human, social, cultural and behavioral modeling
20
Q

Simulation data

Data farming

Which types of data farming are there?

A

Simple: Monte Carlo
- Create data from known distributions

Mechanic: Discrete event-based white box

  • create data given varying scenarios
  • look at everything that happens in the real world -> input data

Emergent: Agent-based black box

  • create data based on assumptions and theories
  • we don’t know what’s actually going on in the empirical agents
21
Q

Simulation data

Data farming with Monte-Carlo Simulations

Problem:
Idea:
Chance:
Risk:
Example:
A

Problem:
- empirical data set is not large enough to allow for significant statements about variable relationships

Idea:
- create additional data based on distributions fitted to the empirical sample (fill in potential gaps)

Chance:
- draw more meaningful conclusions from the enriched data set

Risk:
- enriched data set is “tainted” by a priori assumptions about distributions (gaps are filled based on assumptions)

Example:
- survey responses on the influence of consultative committees

22
Q

Verification

A

aims to ensure that the code does what the specification asks - identify and eliminate “bugs”

Problem: when to stop testing?

  • testing takes a creative and destructive mind
  • avoid testing your own code
  • test cases should be formulated by subject matter experts (don’t use the same people to develop and test -> won’t notice mistakes)
  • test cases only prove the absence of those errors that they were designed to test
23
Q

Structural validation

A

Does the model correctly represent the problem space?

  • systematically explicate model components and relationships
  • expert walkthrough with mode stakeholders

Questions:

  • Are all concepts and structures relevant to the problem included?
  • is the model structure consistent with relevant knowledge of the system?
24
Q

Structural validation

ODD protocol

A
  • used to describe individual-based models, agent based models, simulation models

Overview:

  1. Purpose
  2. Entities, state variables, and scales
  3. Process overview and scheduling

Design:
4. Design Concepts (Basic principles, emergence, adaptation, objectives, learning, prediction …)

Details:

  1. Intitialization
  2. Input data
  3. Submodels
25
Q

Input data validation

Input validation

A
  • sometimes defined as an aspect of structural validation definition
  • design input probability distributions that match the empirical observations
  • design input parameter values that match expected future scenarios

Questions:

  • Do inputs correspond to empirical observations?
  • Are the parameter values consistent with descriptive numerical knowledge?
  • > parametrizing can show that there’s a lack of information
  • > e.g. building a simulation to test different settings for the shop floor -> you find out that manufacturer doesn’t know the machine runt time
  • > if input variables cannot be validated, output cannot be validated either
26
Q

Output validation

A
  • Does the simulation output match empirical observations?
  • also called behavioral validation
  • > run the simulation model and generate output data -> you have to wait for the last minute to do output validation
  • > compare if the simulation does what we observe in the real world
  • > e.g. model the queue precision correctly
27
Q

Output validation

Questions of output validation

A

What is the average distance between empirical observations and simulation results?

  • percentage: MAPE
  • absolute: RMSE
  • bias
  • > How much error is acceptable?

Do confidence intervals overlap?
-> How much confidence is enough?

Does a meta-model fitting the empirical observations also fit the simulation? (E.g. a regression model)

  • for this, empirical input and output information needs to be available
  • > What fit suffices for the empirical model, anyway?
  • > How closely does the regression model match?
28
Q

Output validation

Cross-validation

A
  • the data set is split into training, validation and test set
  • to calibrate the simulation, you use the training and validation set
  • then you use the test set for test validity