The Simulation Model Life Cycle (Part 2) - Working the Data

Jaco-Ben Vosloo
Dec 16, 2021
5 min read

Updated: Jul 14, 2022

From start to finish, best practices and practical advice for doing a simulation-based project

This post is part 2 from a 7 part series about the Simulation Model Life cycle. You can catch up on the previous post here

Life cycle of a simulation model

Defining the Problem
Working the Data
Building the Model
Verification and Validation
Experimentation
Analyze the Results
Report on the Findings

This post focuses on the second step and future posts will focus on the remaining steps - you can start watching the lecture from step 2 here, or continue reading the blog post...

Note: Although one does go "back" in later steps to do some activities during previous steps, you by no means redo the entire step. The steps listed above are guidelines of the general steps that a typical project follows. On all steps, you can go backward and revisit previous steps, but unlikely that you will skip a step or move forward before completing a step to at least 80%-90%.

For a more insightful discussion on the matter read here

2) Working the data

There are essentially two parts to this phase, the first being actually getting the data, which can sometimes be a complete project on its own ;-) and the second using the data to drive your model.

2.1) Getting data

Use scope as a guideline

Especially on big projects making a big data request to the right people can often be a very daunting task. Use your scope as a guideline to figure out who to ask, e.g. the sales department, and what to specifically ask them, e.g. orders for the last 12 months containing the customer service duration.

Using your scope you will also be able to provide them the reason as to why you need it if they are not part of the core project team.

Get more than you need, unless the retrieval time increases significantly

If you have any uncertainty if a piece of data that is readily available will be useful or not to you be sure to ask for it during the first data request.

In most projects there are numerous iterations during the data collection phase and you can potentially reduce it by asking for all the available data upfront in the first request. More often than not the people getting the data would prefer getting all the data in one go than to go back and get more of the same data or to go and update previous queries.

If this is not the case, if for example the data are in different databases or it requires some additional work to be retrieved, then rather delay the collection until you are sure that you need it.

Remember: it is always easier to have the data and not use it, than to want it and have to bother a person (again) to go and get it ;-)

Validate the data

Validate, validate, validate!

Ensuring that the data you get is not only the right data that you asked for, but also that the data is useable, is a key step and not to be overlooked!

Many times in big organizations the data contains anomalies or special events that can dramatically impact your model output. You either need to be aware of these so that you don't go and look for errors in your model when you see the impact of these anomalies on your model. Always inquire if these anomalies can be removed from your dataset.

In my experience it is best to do a simple exploratory data analysis on every column in a data table that you get as a first pass, e.g. a simple min, max, average, before doing more "thoughtful" analysis, histograms, correlations between columns etc.

2.2) Data drive your model

Everything must be data-driven

A good model has all the setup values, input data and anything else that is not logic (as far as possible) sourced, in some way, from an input source.

Nothing must be hardcoded

If there are some setup values that do not belong in the user input data then these must be located in some central location, typically stored as model configuration settings for example.

Use a data object

Ideally, make use of a data object as an intermediary between source and model. This is a key requirement to make your models safer, easier and more robust! The best practice is to not source all the user input, or configuration settings from some external file, internal or external database or file, but to rather have a data object. This data object can be set up from these sources.

Your model will only ever care about whether the data object has the data it requires. How you set this up, and from where, the model never needs to care about. This means that the model is completely removed from sourcing the data which enables much better control and flexibility.

P.S. The added benefit here is that this also allows you to create the data object programmatically, which in turn means that you can easily create simple data objects for your unit tests... but more on this in part 3...

Now for some examples...

Example: Data Validation

Continuation of the example covered in the previous post here.

Let's assume we got the following data from our retail client. See if you can spot all the data validation issues that need to be verified, removed or corrected.

Here are some of the checks you can perform on this piece of data

Data Validation

Check customer ID format
Check that the format in column B is all dates
Checked that the arrival_time make sense
Checked serve_time contains reasonable positive values

Example: Data Driving The Model

There are several ways to data drive your model in AnyLogic. One option is to read directly from source files, like the example below:

Alternatively, you can import the Excel file to the internal AnyLogic Database and then use it directly in a number of process modeling and other library blocks. See steps below:

Step1: Import Excel file to AnyLogic internal database

Step 2: Use the database inside library blocks

In the end, the way you source your data greatly depends on the model environment.

One of the biggest benefits of using AnyLogic is the fact that you get to use all the standard and readily available Java libraries to source data in many different ways. You can use Java to import from standard sources like web API, JSON files, XML etc.

But more on these in future posts...

Pro Tip: Using a Data Object

As mentioned before it is best practice to have a separate data object that data drives your model. This allows for:

Easy input data verification and validation
Data manipulation and scenario set up before the model uses the data
Easier setup for Unit Testing since you can create the object programmatically
Multiple independent parallel scenario executions since each experiment has its own data object
Encapsulates the setup data into a single manageable object
Your model needs to be parameterized with a single object instead of a number of parameters

Here is a typical workflow when using a data object to separate the source data from the model.

Example: Using a Data Object

I created a simple model to show how you can data drive your model.

Step 1: Import data from Excel to AnyLogic DB

Step 2: Create a Scenario Object

Step 3: Provide Data Object to model using parameter

You can download the model below.

Since it uses the simulation object for most of the functionality, which is not available in the Anylogic Cloud we have not uploaded it to our account.

Looking forward to the next post on best practices for building your model?

Remember to subscribe, so that you can get notifications via email or you can also join the mobile app here!

Watch the full lecture here, the video below starts at step 2.

What next?

If you liked this post, you are welcome to read more posts by following the links below to similar posts. Why not subscribe to our blog or follow us on any of the social media accounts for future updates. The links are in the Menu bar at the top or the footer at the bottom.

If you want to contact us for some advice, maybe a potential partnership or project or just to say "Hi!", feel free to get in touch here and we will get back to you soon!