10.4324 9781003124092-5 Chapterpdf
10.4324 9781003124092-5 Chapterpdf
Introduction
In chapter 4, we covered the use of diagramming techniques to represent the explanatory
model that provides a structural representation of the system. In general, pure explanatory
models are seldom found, and in a DES model, we usually depend on descriptive sub-
models. For conceptual modelling, when employing a descriptive model, the emphasis is on
the collection and analysis of quantitative data from observations of system behaviour. There
are a number of descriptive modelling (also termed input modelling in simulation studies)
methods, but we may be constrained in our choice by the availability and time required
for data collection. Greasley and Owen (2016) show examples from the literature of where
descriptive approaches to modelling people’s behaviour have been applied to model differ-
ent aspects of model content such as resource availability, process durations and decisions.
The following eight input modelling methods are presented:
1 Estimation
2 Theoretical distribution
3 Empirical distribution
4 Bootstrapping
5 Mathematical equation
6 Cognitive architectures
7 Machine learning
8 Trace
1. Estimation
When little or no data exists because the system does not currently exist or there is no time
for data collection, then an estimate must be made. This option may also be used in the
simulation model development phase before further data collection is undertaken when time
permits. One approach is to simply use a fixed value representing an estimate of the mean
time of the process, but this does not represent the stochastic variability in the process dura-
tion. However, this treatment of what appears to be probabilistic behaviour as deterministic
may be acceptable if it attains the level of detail required by the conceptual model.
DOI: 10.4324/9781003124092-5
72 Conceptual Modelling (Descriptive Model)
2. Theoretical Distribution
Deriving a theoretical distribution is the most common descriptive modelling method in
simulation and may be feasible when over around 25 sample data points are available. This
method can ‘smooth out’ certain irregularities in the data to show the underlying distribution
and is not restricted to generating values within the data sample used. Another important
reason to employ a theoretical distribution is that it allows us to easily change the parameters
of our distribution for scenario analysis. For example, if we are using the exponential distri-
bution for the interarrival rate, we can simply change the mean parameter of the exponential
distribution to represent either slower or faster arrival rate scenarios.
There are two options for choosing an appropriate theoretical distribution. The choice
can be made based on known properties of the variability of the modelling area being rep-
resented. For example, times between arrivals (interarrival times) are often modelled using
an exponential distribution. The overview of the main theoretical distributions given in this
chapter includes common applications of these distributions to different modelling areas.
Some examples of common usage are shown in table 5.1
The second approach and standard procedure is to match a sample distribution to a theo-
retical distribution by constructing a histogram of the data and comparing the shape of the
histogram with a range of theoretical distributions. A histogram is a bar chart that shows the
frequency distribution of the sample data. It should be noted that the histogram summarises
the frequency distribution and does not show how it varies over time.
Further examples of choosing a theoretical distribution are provided here:
Figure 5.1 Histogram of breakdowns and fitted Poisson distribution using the chi-square test
74 Conceptual Modelling (Descriptive Model)
process, such as the Input Analyzer for Arena (see chapter 5A) and Stat:Fit for Simio (see
chapter 5B) and Simul8 (see chapter 5C). It should be noted that if a statistical software
package is employed, then the software will simply find the best mathematical fit of the data
to a distribution. Because the software can check against a wide range of theoretical distribu-
tions, then in some instances, it may make sense to choose a distribution which offers a very
close fit rather than the closest fit. This is in order to employ a distribution that is suggested
by statistical theory. For example, this may mean choosing an exponential distribution for
interarrivals even if it is not the closest fit. This decision will need to be made based on the
experience and intuition of the simulation practitioner.
The following provides details of some of the potential continuous and discrete theoreti-
cal distributions that could be used to represent variability. These represent a subset of the
distributions available in simulation software packages, such as Arena, Simio and Simul8.
A wider selection of distributions is covered in Law (2015).
Continuous Distributions
Continuous distributions can return any real value quantity and are often used to model
interarrival times, timing and duration of breakdown events, and process durations. For a
continuous random variable, the mapping from real numbers to probability values is gov-
erned by a probability density function F(x). A number of continuous distributions are now
shown. Please note that these are examples and the distribution shape will change depending
on the parameter values chosen. The shape of the distribution is described by its skewness
(the asymmetry of the distribution about its mean) and its kurtosis (the degree of peakedness
of the distribution).
Beta
The beta distribution is used in project management networks for process duration. It is most
often used when there is limited data available from which to derive a distribution. It can also
be used to model proportions such as defective items as the distribution can only go between
0 and 1. The parameters beta and alpha provide a wide range of possible distribution shapes
(figure 5.2).
Gamma
The gamma distribution has parameters alpha (shape) and beta (scale), which determine a
wide range of distribution shapes. The gamma distribution is used to measure process dura-
tion. When the shape parameter is close to 1, exponential-like distributions may be a close
fit, and when the shape parameter is close to 2, Weibull-like distributions may be a close fit
(figure 5.4).
Log-Normal
The log-normal distribution is used to model the product of a large number of random quanti-
ties. Its shape is similar to the gamma distribution, and it can also be used to represent process
durations, in particular those that have a relatively low amount of variability (figure 5.6).
Normal
The normal distribution has a symmetrical bell-shaped curve. It is used to represent quanti-
ties that are sums of other quantities using the rules of the central limit theorem. The nor-
mal distribution is used to model manufacturing and service process times and travel times.
Because the theoretical range covers negative values, the distribution should be used with
Conceptual Modelling (Descriptive Model) 77
care for positive quantities such as process durations. A truncated normal distribution may
be used to eliminate negative values, or the log-normal distribution may be an appropriate
alternative (figure 5.7).
Triangular
The triangular distribution is difficult to match with any physical process but is useful for
an approximate match when few data points are available, and the minimum, mode (most
likely) and maximum values can be estimated. Thus, the triangular distribution might be
used when requesting an estimate of minimum, maximum and most likely times from the
process owner. What can be a useful property of the triangular distribution is that the values
it generates are bounded by the minimum and maximum value parameters. The mean value
is calculated by (minimum + mode + maximum)/3 (figure 5.8).
Weibull
The Weibull distribution can be used to measure reliability in a system made up of a number
of parts. The assumptions are that the parts fail independently and a single part failure will
cause a system failure. If a failure is more likely to occur as the activity ages, then an alpha
(shape) value of more than one should be used. The distribution can also be used to model
process duration (figure 5.10).
Binomial
The binomial distribution is used to model repeated independent trials such as the number
of defective items in a batch, the number of people in a group or the probability of error
(figure 5.11).
Poisson
The Poisson distribution can model-independent events separated by an interval of time.
If the time interval is exponentially distributed, then the number of events that occur in the
interval has a Poisson distribution. It can also be used to model random variation in batch
sizes (figure 5.12)
Details are provided on how to model non-stationary processes in Arena (chapter 7A) and
Simio (chapter 7B).
3. Empirical Distribution
An empirical or user-defined distribution is a distribution that has been obtained directly
from raw data. This could take the form of a summary of event log data as used by process
mining software. An empirical distribution is usually chosen if a reasonable fit cannot be
made with the data and a theoretical distribution. A disadvantage is that it lacks the capability
of a theoretical distribution to be easily scaled (by adjusting its parameters) for simulation
scenario analysis. Most simulation practitioners use a statistical software package to define
an empirical distribution, such as the Input Analyzer for Arena (see chapter 7A) and Stat:Fit
for Simio (see chapter 7B) and Simul8 (see chapter 7C).
4. Bootstrapping
This approach involves sampling randomly from raw data, which may be held in a table
or spreadsheet. This creates a discrete distribution with values drawn from a data set and
with their probabilities being generated by the number of times they appear in the data set.
This method does not involve the traditional methods of fitting the data to a theoretical
or empirical distribution and thus may be relevant when these traditional methods do not
produce a distribution that appears to fit the data acceptably. This approach will only gen-
erate data values that occur within the sampled data set, and so the method benefits from
Conceptual Modelling (Descriptive Model) 81
large data sets to ensure that the full range of values that might be generated are present in
the data set. For example, occasionally, large service times may have an important effect
on waiting time metrics, and so the data set should be large enough to ensure these are
represented.
5. Mathematical Equation
Examples of this approach in DES include the following use of mathematical equations to
compute:
The most common use of mathematical equations is the use of learning curve equations
which predict the improvement in productivity that can occur as experience of a process is
gained. If an estimate can be made of the rate at which a process time will decrease due to
increased competence over time, then more accurate model predictions can be made. Math-
ematically, the learning curve is represented by the function
Y = axb
where
y represents the process time of the xth process activation,
a is the initial process time,
x is the number of process activations and
b equals ln p/ln 2.
Here, ln = log10 and p is the learning rate (e.g. 80% = 0.8).
Thus, for an 80% learning curve,
b = ln 0.8 / ln 2 = −0.322.
To implement the learning curve effect in the simulation, a matrix can be used, which
holds the current process time for the combination of each process and each person undertak-
ing this process. When a process is activated in the simulation, the process time is updated,
taking into account the learning undertaken by the individual operator on that particular
process. The use of the learning curve equation addresses the assumption that all operators
within a system are equally competent at each process at a point in time. The log-linear learn-
ing curve is the most popular and simplest learning curve equation and takes into account
only the initial process time and the workers learning rate. Other learning curve equations
can be used, which take into account other factors such as the fraction of machine time that is
used to undertake the process. Examples of constructing a mathematical equation for a DES,
such as a learning curve equation, are contained in Nembhard (2014), Dode et al. (2016) and
Malachowski and Korytkowski (2016).
82 Conceptual Modelling (Descriptive Model)
6. Cognitive Architectures
When attempting to model human behaviour in a simulation model, the use of cognitive
models can be employed. An example is the use of the Psi theory, which is a theory about
how cognition, motivation and emotion control human behaviour. Here, task performance
can be modelled over time, showing how competence oscillates based on experienced suc-
cess and failure at a new task. This approach provides an alternative to the traditional learn-
ing curve effect, which implies a continuous improvement over time without any setbacks
in performance due to failures. Another cognitive model that has been used is the theory of
planned behaviour, which takes empirical data on demographic variables and personality
traits and transforms these into attitudes toward behaviour, subject norms and perceived
behavioural control. Cognitive architectures used to represent the cognitive process include
the physical, emotional, cognitive and social (PECS) architecture (Schmidt, 2000) and the
theory of planned behaviour (TPB) (Ajzen, 1991). Examples of their use in the context of
DES are in the works of Riedel et al. (2009) and Brailsford et al. (2012).
7. Machine Learning
Machine learning (ML) techniques are increasingly being used in simulation projects. While
simulation is based on a model of a system, ML generates a model of behaviour from the
relationship between input data and output data. The combined use of simulation and ML
falls under a hybrid modelling approach (see chapter 13).
A decision tree is an example of a representation of an ML algorithm that is not ‘black
box’, and thus it is possible to trace the path by which an outcome is reached. This makes
them ideal for modelling as they can be translated into if-then-else statements for use in the
simulation model. Figure 5.13 provides a simple example of a credit application decision tree.
8. Trace
Trace-driven simulations use historical process data or real-time information directly. An
example would be a simulation of a historical demand pattern (for example, actual customer
or order arrival times) in order to compare the current process design with a proposed process
design. The advantage of a trace or data-driven simulation is that validation can be achieved by
a direct comparison of model performance metrics to real system performance over a historical
time period. In addition, model credibility is enhanced as the client can observe model behav-
iour replicating real-life events. The disadvantages are the need for suitable data availability
and the possible restriction in scope derived from a single trace that may not reflect the full
variability of the process behaviour. Trace simulation is normally used for understanding pro-
cess behaviour (descriptive analytics) and for checking conformance to the ‘official’ process
design and cannot be used for predictive analytics such as scenario analysis as it only contains
historical data. Trace simulation is similar to process mining which uses historical event logs,
although process mining can check conformance of the process map in addition to operational
process performance. A summary of the trace can be used to form an empirical distribution.
Estimation Lack of accuracy. May be the only option. May be used in the
model development
phase before further
data collection is
undertaken.
(Continued)
84 Conceptual Modelling (Descriptive Model)
Table 5.2 (Continued)
When modelling variability, we need to consider both the granularity of our modelling
and the need to distinguish the ‘normal’ variability we wish to model and variability
caused by a change in the process.
In terms of granularity, we need to ensure that we model variability at a sufficient
level of detail to ensure that it can be represented correctly. For example, if data is col-
lected of the checkout process at a supermarket, this will entail placing goods onto a
Conceptual Modelling (Descriptive Model) 85
conveyor, scanning the items by the checkout operative and then paying for the items.
To collect data on the overall process ignores that the conveyor and scanning process
times are dependent on the number of items being bought while the payment process is
not. Thus, if we are using the method of fitting to a theoretical distribution, the overall
process should not be fitted to a single distribution but broken down for analysis. As
well as sequential processes, care should also be taken with parallel processes in which
different options may require fitting to separate distributions. For example, mutually
exclusive payment types (cash, card, touch payment) will have different distributions.
In addition, we need to note that the theoretical distribution represents the normal
variability, which is inevitable and what we would expect of a process design operat-
ing in a given environment. We can only remove this variability by improving the
design of the process. If the variability is normal (i.e. the variability is due to random
causes only), then the process is in a stable state of statistical equilibrium, and for
example, if we are modelling using a probability distribution, then the parameters of
its probability distribution are unchanging. Conversely, if the performance variability
is abnormal (i.e. due to a change in process procedures or environment, for example),
then the abnormal variability disturbs the state of the statistical equilibrium of the pro-
cess leading to a change in the parameters of the distribution.
Summary
Eight input modelling methods are presented in this chapter: estimation, theoretical distribu-
tion, empirical distribution, bootstrapping, mathematical equation, cognitive architectures,
machine learning and trace. The most common approach employed in DES is the theoretical
distribution method, which provides a compact method of representing data values, with the
distribution easy to scale for scenario analysis.
Exercises
• Compare and contrast theoretical and empirical distribution input modelling methods.
• Evaluate the trace input modelling method for a simulation study.
• Evaluate the use of machine learning as an input modelling method.
• Compare the process of deriving a theoretical and empirical distribution.
• A number of observations have been made of arrivals to a supermarket (see table 5.3).
The frequency of arrivals in a ten-minute period is as follows. Using a chi-square ‘good-
ness of fit’ test, find if the arrivals can be described by a Poisson distribution with a mean
of 1 at a significance level of 0.1.
• A number of observations have been taken of customers being served at a cafeteria (see
table 5.4). Analysis has revealed a mean of 2 minutes and a standard deviation of 0.5
minutes for the data. Investigate using a Kolmogorov-Smirnov test that the observations
are normally distributed at a significance level of 0.1.
• The following data has been collected on arrival times (see table 5.5). Find the param-
eters of an empirical distribution that fits the data.
References
Ajzen, A. (1991) The theory of planned behaviour, Organizational Behaviour and Decision Processes,
50(2), 179–211.
Brailsford, S.C., Harper, P.R. and Sykes, J. (2012) Incorporating human behaviour in simulation mod-
els of screening for breast cancer, European Journal of Operational Research, 219(3), 491–507.
Dode, P., Greig, M., Zolfaghari, S. and Neumann, W.P. (2016) Integrating human factors into dis-
crete event simulation: A proactive approach to simultaneously design for system performance and
employees’ well-being, International Journal of Production Research, 54(10), 3105–3117.
Greasley, A. (2004) Simulation Modelling for Business, Ashgate Limited.
Greasley, A. and Owen, C. (2016) Behavior in models: A framework for representing human behavior,
in M. Kunc, J. Malpass, and L. White (eds.) Behavioral Operational Research: Theory, Methodology
and Practice, Palgrave Macmillan.
Law, A.M. (2015) Simulation Modeling and Analysis, 5th edition, McGraw-Hill Education.
Malachowski, B. and Korytkowski, P. (2016) Competence-based performance model of multi-skilled
workers, Computers and Industrial Engineering, 91, 165–177.
Nembhard, D.A. (2014) Cross training efficiency and flexibility with process change, International
Journal of Operations and Production Management, 34(11), 1417–1439.
Riedel, R., Mueller, E., Von Der Weth, R. and Pflugradt, N. (2009) Integrating human behaviour into
factory simulation – A feasibility study, IEEM 2009 – IEEE International Conference on Industrial
Engineering and Engineering Management, 2089–2093.
Schmidt, B. (2000) The Modelling of Human Behaviour, SCS Publications.