11 - Validating simulation models Flashcards by Jennifer Schuster

Motivation: Why validate simulations?

when not rigorously calibrated and validated, simulations are neither a reliable research method nor a reliable tool for practical decision support

How well did you know this?

Not at all

Perfectly

Definition: Verification, Validation, Calibration
(Steps, that subsume validation)

Verification

Does the code do what the specification asks?

- code could include faults

How well did you know this?

Not at all

Perfectly

Definition: Verification, Validation, Calibration
(Steps, that subsume validation)

Structural Validation

Does the model correctly represent the problem space?

- are relevant relationships included in the model etc?

How well did you know this?

Not at all

Perfectly

Definition: Verification, Validation, Calibration
(Steps, that subsume validation)

Input/Output validation

Does the simulation correctly represent the problem space?

some kind of description of the status quo based on data
parametrise the shape of the distribution

Calibration: adjusting input values to get valid output values
-> danger: overfitting the model when calibrating it

-> agent-based models are subject to over parametrization -> customer learning -> we can make assumptions but we don’t know the parameter for sure (no input parameters)

How well did you know this?

Not at all

Perfectly

Simulation Data

Different layers of data

Input data
Process data
Output (result) data

How well did you know this?

Not at all

Perfectly

Simulation Data

Different layers of data

Input data

collected of inferred from empirical status quo

- e.g. customer arrival rate, range of products a company offers

How well did you know this?

Not at all

Perfectly

Simulation Data

Different layers of data

Process data

generated during the simulation process, provides insights into the simulation model that are not empirically available or relevant for the simulation purpose
process or event logs from the real world can be compared

How well did you know this?

Not at all

Perfectly

Simulation Data

Different layers of data

Output (Results) data

indicators calculated for validation or what-if analysis, matching empirical indicators
in the status quo these are the indicators we are interested in -> what happens when we change them? Outcome?

How well did you know this?

Not at all

Perfectly

Simulation data

Input data

Assumption:
- Structure of relevant simulation components and parameters has been determined

Direct observation:

for transparent systems
example: machine run times
or: what is the actual outline of the shop

Indirect inference:

based on empirical process and result indicators
example: customer choice
> we know what happened but not why, so we apply data analysis to find a model how things work in the real world

How well did you know this?

Not at all

Perfectly

Simulation data

Input data: scenarios

Stochastic scenarios
Worst- and best case scenarios
Qualitatively discrete scenarios

How well did you know this?

Not at all

Perfectly

Simulation data

Input data: scenarios

Stochastic scenarios

follow empirical distributions
stochastic input scenarios lead to stochastic process and result data
> when we are uncertain about data -> consider stochastic scenarios

How well did you know this?

Not at all

Perfectly

Simulation data

Input data: scenarios

Worst- and best-case scenarios

model extreme cases
results indicate the modeled systems’ robustness
> robustness = does it behave the same in each of these cases?

How well did you know this?

Not at all

Perfectly

Simulation data

Input data: scenarios

Qualitatively discrete scenarios

model discrete alternative cases

- test robustness and the necessity of individualized strategies

How well did you know this?

Not at all

Perfectly

Simulation data

Example: railway ticket as simulation input

Directly observable:

supply: products, capacity, price categories, availabilities
demand: historical sales

Indirect inference (also: estimation):

customer loyalty (e.g. we have the name on the ticket -> we can analyse how often they bought in the past)
reference prices
willingness to pay
> can use this to calibrate customers in a model

How well did you know this?

Not at all

Perfectly

Simulation data

Input data on the future

Insights:
- analyzing empirical data to parametrize the simulation input data is based on insights, the pre-condition for predictive analytics
Forecast:
- to simulate future scenarios, the future values of input data have to be forecasted

How well did you know this?

Not at all

Perfectly

Simulation data

Process data

Study These Flashcards

Simulation systems are fully transparent - all data that is generated in the process of the simulation can be observed

We may look at: empirically-available process data

forecasts
plans
documented events

And we can compare it to: simulation-exclusive process data (= generate process data that wouldn’t have been availble in the real world):

decisions
learned experiences
communication

Simulation data

Result data

Study These Flashcards

Usually empirically-available

system reports
transaction data

Simulation-exclusive

long-term developments
what-if developments
internal agent- and system-states: e.g. customer satisfaction, emergent strategies

Simulation dara

Result data

Usage

Study These Flashcards

Output validation: compares result data to empirical data
sensitivity and meta-modeling: analyse relationship in the “black box”
data farming: simulations generate artificial transaction data -> data of the real world is not sufficient so we need more data

Simulation data

Data farming

Study These Flashcards

= simulation models run thousands of times can provide insights into the different consequences of different options

generates as much reproducible data as desired (amount is only limited to time and storage space)
success depends on validity - which is difficult to determine for human, social, cultural and behavioral modeling

Simulation data

Data farming

Which types of data farming are there?

Study These Flashcards

Simple: Monte Carlo
- Create data from known distributions

Mechanic: Discrete event-based white box

create data given varying scenarios
look at everything that happens in the real world -> input data

Emergent: Agent-based black box

create data based on assumptions and theories
we don’t know what’s actually going on in the empirical agents

Simulation data

Data farming with Monte-Carlo Simulations

Problem:
Idea:
Chance:
Risk:
Example:

Study These Flashcards

Problem:
- empirical data set is not large enough to allow for significant statements about variable relationships

Idea:
- create additional data based on distributions fitted to the empirical sample (fill in potential gaps)

Chance:
- draw more meaningful conclusions from the enriched data set

Risk:
- enriched data set is “tainted” by a priori assumptions about distributions (gaps are filled based on assumptions)

Example:
- survey responses on the influence of consultative committees

Verification

Study These Flashcards

aims to ensure that the code does what the specification asks - identify and eliminate “bugs”

Problem: when to stop testing?

testing takes a creative and destructive mind
avoid testing your own code
test cases should be formulated by subject matter experts (don’t use the same people to develop and test -> won’t notice mistakes)
test cases only prove the absence of those errors that they were designed to test

Structural validation

Study These Flashcards

Does the model correctly represent the problem space?

systematically explicate model components and relationships
expert walkthrough with mode stakeholders

Questions:

Are all concepts and structures relevant to the problem included?
is the model structure consistent with relevant knowledge of the system?

Structural validation

ODD protocol

Study These Flashcards

used to describe individual-based models, agent based models, simulation models

Overview:

Purpose
Entities, state variables, and scales
Process overview and scheduling

Design:
4. Design Concepts (Basic principles, emergence, adaptation, objectives, learning, prediction …)

Details:

Intitialization
Input data
Submodels

Input data validation Input validation

- sometimes defined as an aspect of structural validation definition - design input probability distributions that match the empirical observations - design input parameter values that match expected future scenarios Questions: - Do inputs correspond to empirical observations? - Are the parameter values consistent with descriptive numerical knowledge? - > parametrizing can show that there's a lack of information - > e.g. building a simulation to test different settings for the shop floor -> you find out that manufacturer doesn't know the machine runt time - > if input variables cannot be validated, output cannot be validated either

Output validation

- Does the simulation output match empirical observations? - also called behavioral validation - > run the simulation model and generate output data -> you have to wait for the last minute to do output validation - > compare if the simulation does what we observe in the real world - > e.g. model the queue precision correctly

Output validation Questions of output validation

What is the average distance between empirical observations and simulation results? - percentage: MAPE - absolute: RMSE - bias - > How much error is acceptable? Do confidence intervals overlap? -> How much confidence is enough? Does a meta-model fitting the empirical observations also fit the simulation? (E.g. a regression model) - for this, empirical input and output information needs to be available - > What fit suffices for the empirical model, anyway? - > How closely does the regression model match?

Output validation Cross-validation

- the data set is split into training, validation and test set - to calibrate the simulation, you use the training and validation set - then you use the test set for test validity

11 - Validating simulation models Flashcards

(28 cards)