The harmful focus on data

Stefan Bengtsson
Jan 31, 2022
10 min read

Updated: Feb 1, 2022

This is a special guest blog post by Stefan Bengtsson. His previous blog post about the dangers of "blockification" in simulation can be found here: Pitfalls with Blocks.

In this post, Stefan challenges the relationship between simulation modeling and data. He discusses how to overcome this and have your simulation models add more value.

After getting a taste for "blogging", writing a follow-up felt natural. In the first post I addressed what I feel are downsides with the blockification of dynamic modeling. Like a rampaging bull in a china shop I will now, having gotten warmed up, continue to challenge some key concepts in simulation modeling. I will focus on what I think is possibly the most severe root cause of the under-utilization of the infinite possibilities with dynamic modeling and simulation. I think this is related to - as indicated in the title - a totally illogical relationship to "data".

As the picture here tries to illustrate, simulation and related modeling is about navigating better towards the unknown future. This has NOTHING - and I repeat, NOTHING - to do with whether we have any data or not. I claim that words like validation and verification bear a big chunk of the blame for the vast majority of simulation projects that should have added value (if handled by talented individuals) have not even been considered! So now the die is cast - "Alea iacta est"!

Most projects are not even born

Referring back to the picture, focusing on data tends to force us to stay on the safe cliff, on the left side of the picture. Data capture what is safe, what is known, what has happened, what is history (or in the best cases - what is present). But what we focus on with simulation is the opposite, the unknown, what has not yet happened, how the future might be. This is anything but safe and it has nothing to do with historical data. To add value, we must accept that if we want safety, we should not bother with simulation, but rather build a career in data mining, statistics, or database development. But unfortunately, this "safety approach" also to a large extent affects the simulation profession - and that makes the vast majority of opportunities being left untouched. "We lack data!", "We can't verify and validate!", "We ...!" have been expressed over and over and over. In a way I am unfair when I blame verification and validation, both sound concepts when possible ...! I have hardly ever used these terms myself, but prefer to speak about quality assurance. This is of course a must during any professional modeling process - to continuously try to secure that newly added logic is in line with the logic one wants to mimic. Exactly how this is done will vary, from individual to individual. So why do I have problems with these terms, found in almost every book written about simulation modeling (which, by the way, is part of the problem!)?

Well, I claim there are a number of reasons:

For starters, the real world - the one we are supposed to support and add value to - is rude enough to "not give a damn" whether we fancy elaborate data - having everything predictable, nice and tidy - or not. It will continue to be messy, chaotic, and certainly not deterministic, independent of if we like it or not. So 99.999...% of the issues, challenges, and problems out there will to a large extent lack relevant data. Should we then say: "Sorry, we do not like the premises. Give us a call when you can present a tidy problem for us!"? Or should we accept the reality as it is and jump down in the mud, getting our boots dirty? I know what I think the answer should be.
There is a tendency to relate both validation and verification to a need for data - to conduct these quality assurance processes. And it is really that tendency I criticize. Of course, if we have a project that focuses on a system that exists, if we for that system have historical data of: 1) the demands/input that was given to the system to handle 2) the prerequisites related to resources, decision strategies, value-adding time for various activities (with variation), ... during the time period the input was handled 3) all relevant results/outputs from handling the input during the relevant time period then great! Then we should of course check whether our model with parameter values according to point 2, will produce output in lines with point 3, if we challenge it with 1. And if not, we probably have to improve our model and the logic of it. But being in this situation, I claim, is unusual - since our focus should be systems that don't exist, systems of the future. Unless we stay on the safe cliff that is ...!
And even if we have a project where the focus is something that truly exists - like a production line, a factory, a traffic system, an emergency ward, a consumer market, ... - and we are able to verify that the model mimics the known history well enough, our efforts should be viewed with some skepticism. Because we have verified that the logic/model is good enough constructed under the historical circumstances - but the purpose of the model is not to look backward, it is to allow us to experiment with changed circumstances. As soon as we start changing the input or/and prerequisites, the situation is no longer fully "verified" (even though it of course is more likely that the logic is better than without this verification effort), since we now might pass some dynamic tipping point, that our logic was not prepared for. Or whatever.

So my point quite simply is that we can't verify the future, like it or not. And the value of simulation is about understanding consequences in the future better. And we should therefore not - hardly ever - have any wishes or demands related to data when we ponder upon whether a simulation model should be considered. As a matter of fact - we should rather embrace the situation when there is a lack of "data"!

Potential inversely related to the amount of data

During the first international AnyLogic Conference (2012, in Berlin), I had a presentation where I among other issues, raised this subject. It was by no means new to me, since I have thought of this all along, but here I more clearly voiced it publicly. It lead to a good discussion with one in the audience, who partly questioned what I said but in the end I think we could agree! :-)

The picture I used then was similar to the one below.

One of the points I raise here is that the less (relevant) data we have, the more good we can do with simulation! Why is that? Well, we should always compare with the alternative! In a situation when we have a lot of relevant data (generated or extracted somehow) and don't use simulation, then we will anyhow be fairly well equipped to handle our challenges, make our decisions, and move forward. It is true that we would be even wiser if we in a competent way constructed a simulation model and loaded it with the data, so it might definitely be worthwhile and valuable. But let us now compare with the opposite situation - a real-life challenge without a trace of "relevant data". We still might have to make decisions. We still might have to move forward. But now we will be more or less totally in the dark if we decide to go forward without the assistance of simulation. We will have to pray that we have an oracle, a guru, someone infinitely wise, or at least a very lucky guy making the decisions. If we compare this with a situation where we capture the issue, the challenge, the system, in a simulation model, including relevant parameters and indicators, the difference will be massive! Even though we do not know exact data to load the model with, we can guess, estimate, test worst-and-best situations and so on. By doing this we can raise the understanding of the system (and its dynamics) and the problem at hand dramatically. Especially compared to the alternative.

So I think, no claim, no say, that the less data you have, the keener you should be to consider creating a simulation model!!! The amount of data is inversely related to the value we can add - if we think in relative terms (and we always should!). Data focus will glue us to that cliff - and to constantly stay there is for cowards!

Simulation is 100% about the logic of the system

Another way to formulate the headline above is that simulation is 0% about data. If Bill Gates - or Microsoft in general - had been fixated on that you had to have data before you did anything, Excel would never have been created. Excel is also 0% about data, but it allows us to analyze, summarize, structure, ... data in cells and spreadsheets. If the figures put into the cells are relevant, quality assured, guessed, estimated, totally unrealistic or whatever, that is up to the user of Excel. Excel does not care!

I of course made the analogy with Excel to point out that exactly the same logic is relevant for simulation modeling. Lacking data should never stop us from considering to create a dynamic model. I would even claim that it is to prefer and that the process should be:

1) Start modeling: so you start to understand the challenge and system better;

2) Work the data: establish what relevant data is available, what can easily be extracted, and what must be estimated or handled in another manner.

The reason for this specific order is because the modeling as such will help us realize what type of data is relevant - and what is not. The opposite order will lead to wasting energy and time on data that later was realized to be irrelevant.

Systems Thinking tends to be heavily tilted towards more academic culture, MIT and System Dynamics (SD). I have a much wider and less formal view of this concept, based on the simple observation that it often adds value to start our thinking by trying to grasp the whole span of an issue, a "whole", a system, the big picture - and the logic, structure, and dimensions that are relevant to consider given this. This is not at all limited to SD, but should at least as much be influenced by Operations Research, Management Science, and Management in general. The strongest competence to add value when it comes to Systems Thinking, without comparison, is talent in dynamic modeling and simulation. There is a tendency - not just related to simulation - to start with the data, the figures, the details. That might be relevant for more narrow, specialist-tilted challenges, but very seldom so for more complex ones, where many perspectives must be considered. I like to see it in terms of that I stand for an applied or practiced view of Systems Thinking.

Looking forward

In a way, a lot of what I claim above can be summarized with the picture below. Statistics and Probability theory are by many confused to be more or less the same - and you often find these subjects in the same institution in the academic world. And yes - the theoretical and mathematical baseline make them similar. But the applications and mindset certainly don't! I even claim that you often see different personalities related to individuals tilting to either side in the picture below and as you might have guessed, I very clearly tilt to the right!

Being too leftwards-leaning will lead to a tendency to steer towards the future using the rear-view mirror - and that might work a short time, when the road is straight, no other vehicles are in sight, and no moose or pedestrian decides to use the road! But after this? I am therefore a bit afraid for the overemphasis on data I feel our society is tilting towards. It feels like disciplines like Analytics, Big data, ... etc. claim "just feed us with more and more data, and we will solve the world problems for you!". I know - I am oversimplifying, but I think you get my point. Having a too data-focused attitude will in the long run lead to a decreased ability to "think" and evaluate what has generated the data and what is relevant/irrelevant. There will be a machine-like approach, rather than a reflective approach, and there is a danger with this mindset.

In the picture, I also refer to a writer who is knowledgeable when it comes to cultural differences, Richard D. Lewis. He claims that one of the things that clearly separates various cultures is how we think about time, some more circular (Asiatic cultures primarily), some more horizontal, and some more mixed. In western cultures we tend to think in terms of a timeline, pointing forwards, where we can plan and jot down our busy schedules. We often draw an arrow in the direction we look - towards the future. In Madagascar, Lewis points out, the arrow is drawn in the opposite direction, "backward". Or possibly hitting us from behind. Why is that? Well, caused by a big chunk of wisdom I claim, they realize that the only thing we know anything about - what we "see" - is the past. The future is unknown - and can therefore not be in the direction we are looking. I think it might be a good idea to discriminate a bit when we employ modelers-to-be. Individuals from Madagascar seem to be formed in the right mold!

What-if and causality

I am almost done, but felt I wanted to touch upon this subject too - since it is very much related. The slide below gets a bit hard to grasp when you look at it like this (I almost always let my slides be dynamic so that I can animate step-by-step), but I hope the messages are conveyed. We can never with data focus, mathematics, ... capture causality as a rule. The only thing we can grasp that way is a correlation. To understand cause-and-effect we must, by necessity, focus on the logic of the system as such - exactly what we do (or should do) with simulation!

We are not even interested in historical data when we make decisions for the future, I claim. We are only interested in forecasts related to the situation in the future. Historical data might - if the data is relevant - help us in making those forecasts (at least if the future is nice enough and doesn't change too much ...!). But it is the forecast (or guesstimate) that is relevant. Data is only indirectly relevant - sometimes.

With competence comes responsibility

Yes, in my early teenage years I liked superhero comics. And yes, Spider-Man was my favorite. I could therefore not stop myself from concluding with a headline, inspired by one of the more famous quotes from the movies! In the slide I also include how I try to summarize what simulation modeling is all about, being a form of art.

The possibilities to add value with competence in the field of dynamic modeling and simulation, given the right platform and an aptitude in modeling is enormous. Don't stay on the cliff! If data is missing, be even more inclined to help out! If the customer or manager does not understand, try to convince them! Not trying to contribute with this type of competence is partly irresponsible - so go out there and do some good!

Best of luck!

Stefan Bengtsson

Stefan Bengtsson is a guest writer for the AnyLogic Modeler. Feel free to connect with him over LinkedIn.

What next?

If you liked this post, you are welcome to read more posts by following the links above to similar posts. Why not subscribe to our blog or follow us on any of the social media accounts for future updates. The links are in the Menu bar at the top, or the footer at the bottom. You can also join the mobile app here!

If you want to contact us for some advice, maybe a potential partnership or project or just to say "Hi!", feel free to get in touch here and we will get back to you soon!