0% found this document useful (0 votes)
40 views32 pages

Oracle Data Modeling and Relational Database Design

This paper discusses algorithms for discovering process models from streaming event data. The authors present a general framework for defining process mining algorithms that can handle streaming data. They adapt the Heuristics Miner algorithm to work with streaming event logs and evaluate its performance on artificial and real-world event streams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views32 pages

Oracle Data Modeling and Relational Database Design

This paper discusses algorithms for discovering process models from streaming event data. The authors present a general framework for defining process mining algorithms that can handle streaming data. They adapt the Heuristics Miner algorithm to work with streaming event logs and evaluate its performance on artificial and real-world event streams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Heuristics Miners for Streaming Event Data

Andrea Burattin ∗ Alessandro Sperduti †


Wil M. P. van der Aalst ‡
arXiv:1212.6383v1 [cs.DB] 27 Dec 2012

Abstract
More and more business activities are performed using information systems.
These systems produce such huge amounts of event data that existing systems are
unable to store and process them. Moreover, few processes are in steady-state
and due to changing circumstances processes evolve and systems need to adapt
continuously. Since conventional process discovery algorithms have been defined
for batch processing, it is difficult to apply them in such evolving environments.
Existing algorithms cannot cope with streaming event data and tend to generate
unreliable and obsolete results.
In this paper, we discuss the peculiarities of dealing with streaming event data
in the context of process mining. Subsequently, we present a general framework
for defining process mining algorithms in settings where it is impossible to store all
events over an extended period or where processes evolve while being analyzed.
We show how the Heuristics Miner, one of the most effective process discovery
algorithms for practical applications, can be modified using this framework. Dif-
ferent stream-aware versions of the Heuristics Miner are defined and implemented
in ProM. Moreover, experimental results on artificial and real logs are reported.

Keywords: process mining; control-flow discovery; online process mining

1 Introduction
One of the main aims of process mining is control-flow discovery, i.e., learning process
models from example traces recorded in some event log. Many different control-flow
discovery algorithms have been proposed in the past (see [20]). Basically, all such
algorithms have been defined for batch processing, i.e., a complete event log containing
all executed activities is supposed to be available at the moment of execution of the
mining algorithm. Nowadays, however, the information systems supporting business
processes are able to produce a huge amount of events thus creating new opportunities
and challenges from a computational point of view. In fact, in case of streaming data
it may be impossible to store all events. Moreover, even if one is able to store all
event data, it is often impossible to process them due to the exponential nature of most
algorithms. In addition to that, a business process may evolve over time. Manyika et al.
[15] report possible ways for exploiting large amount of data to improve the company
business. In their paper, stream processing is defined as “technologies designed to
∗ Email:burattin@math.unipd.it. Affiliation: Department of Mathematics, University of Padua, Italy.
† Email:sperduti@math.unipd.it. Affiliation: Department of Mathematics, University of Padua, Italy.
‡ Email: w.m.p.v.d.aalst@tue.nl. Affiliation: Department of Mathematics and Computer Science, Eind-

hoven University of Technology, The Netherlands.

1
process large real-time streams of event data” and one of the example applications is
process monitoring. The challenge to deal with streaming event data is also discussed
in the Process Mining Manifesto1 [10].
Currently, however, there are no process mining algorithms able to mine an event
stream. This paper is the first that presents algorithms for discovering process models
based on streaming event data. In the remainder of this paper we refer to this problem
as Streaming Process Discovery (or SPD).
According to [2, 3], a data stream consists of an unbounded sequence of data items
with a very high throughput. In addition to that, the following assumptions are typically
made: i) data is assumed to have a small and fixed number of attributes; ii) mining
algorithms should be able to process an infinite amount of data, without exceeding
memory limits or otherwise fail, no matter how many items are processed; iii) for
classification tasks, data has a limited number of possible class labels; iv) the amount
of memory available to a learning/mining algorithm is considered finite, and typically
much smaller than the data observed in a reasonable span of time; v) there is a small
upper bound on the time allowed to process an item, e.g. algorithms have to scale
linearly with the number of processed items: typically the algorithms work with one
pass of the data; and vi) stream “concepts” are assumed to be stationary or evolving
[25, 27].
In SPD, a typical task is to reconstruct a control-flow model that could have gen-
erated the observed event log. The general representation of the SPD problem that we
adopt in this paper is shown in Fig. 1: one or more sources emit events (represented as
solid dots) which are observed by the stream miner that keeps the representation of the
process model up-to-date. Obviously, no standard mining algorithm adopting a batch
approach is able to deal with this scenario.
An SPD algorithm has to give satisfactory answers to the following two categories
of questions:
1. Is it possible to discover a process model while storing a minimal amount of
information? What should be stored? What is the performance of such methods
both in terms of model quality and speed/memory usage?
2. Can SPD techniques deal with changing processes? What is the performance
when the stream exhibits certain types of concept drift?
In this paper, we discuss the peculiarities of mining a stream of logs in the context
of process mining. Subsequently, we present a general framework for defining process
mining algorithms for streams of logs. We show how the Heuristics Miner, one of the
more effective algorithms for practical applications of process mining, can be adapted
for stream mining according to our SPD framework.

A data stream is defined as a “real-time, continuous, ordered sequence of items” [7].


The ordering of the data items is expressed implicitly by the arrival timestamp of each
item. Algorithms that are supposed to interact with data streams must respect some
requirements, such as: a) it is impossible to store the complete stream; b) backtracking
over a data stream is not feasible, so algorithms are required to make only one pass
over data; c) it is important to quickly adapt the model to cope with unusual data
values; d) the approach must deal with variable system conditions, such as fluctuating
stream rates. Due to these requirements, algorithms for data streams mining are divided
1 The Process Mining Manifesto is authored by the IEEE Task Force on Process Mining (www.win.

tue.nl/ieeetfpm/).

2
Time Events emi�ed over �me

... Network communica�on

Stream miner instance

...

Figure 1: General idea of SPD: the stream miner continuously receives events and,
using the latest observations, updates the process model.

into two categories: data and task based [6]. The idea of the first ones is to use only
a fragment of the entire dataset (by reducing the data into a smaller representation).
The idea of the latter approach is to modify existing techniques (or invent new ones) to
achieve time and space efficient solutions.
The main “data based” techniques are: sampling, load shedding, sketching and ag-
gregation. All these are based on the idea of randomly select items or stream portions.
The main drawback is that, since the dataset size is unknown, it is hard to define the
number of items to collect; moreover it is possible that some of the items that are ig-
nored were actually interesting and meaningful. Other approaches, like aggregation,
are slightly different: they are based on summarization techniques and, in this case,
the idea is to consider measures such as mean and variance; with these approaches,
problems arise when the data distribution contains many fluctuations.
The main “task based” techniques are: approximation algorithms, sliding window
and algorithm output granularity. Approximation algorithms aim to extract an approx-
imate solution. It is possible to define error bounds on the procedure. This way, one
obtains an “accuracy measure”. The basic idea of sliding window is that users are more
interested in most recent data, thus the analysis is performed giving more importance
to recent data, and considering only summarization of the old ones. The main charac-
teristic of “algorithm output granularity” is the ability to adapt the analysis to resource
availability.
The task of mining data stream is typically focused on specific types of algorithms
[6, 27, 2]. In particular, there are techniques for: clustering; classification; frequency
counting; time series analysis and change diagnosis (concept drift detection). All these
techniques cope with very specific problems and cannot be adapted to the SPD prob-
lem. However, as this work presents, it is possible to reuse some principles or to reduce
the SPD to sub-problems that can be solved with the available algorithms.
Over the last decade dozens of process discovery techniques have been proposed
[20], e.g., the Heuristics Miner [24]. However, these all work on a full event log and not
streaming data. Few works in process mining literature touch issues related to mining
event data streams.
In [12, 13], the authors focus on incremental workflow mining and task mining (i.e.

3
the identification of the activities starting from the documents accessed by users). The
basic idea is to mine process instances as soon as they are observed; each new model is
then merged with the previous one so to refine the global process representation. The
approach described is thought to deal with the incremental process refinement based
on logs generated from version management systems. However, as authors state, only
the initial idea is sketched.
An approach for mining legacy systems is described in [11]. In particular, after
the introduction of monitoring statements into the legacy code, an incremental process
mining approach is presented. The idea is to apply the same heuristics of the Heuristics
Miner into the process instances and add these data into an AVL tree, which are used
to find the best holding relations. Actually, this technique operates on “log fragments”
and not on single events so it is not really suitable for an online setting. Moreover,
heuristics are based on frequencies, so they must be computed with respect to a set of
traces and, again, this is not suitable for the settings with streaming event data.
An interesting contribution to the analysis of evolving processes is given in the
paper by Bose et al. [5]. The proposed approach, based on statistical hypothesis tests,
aims at detecting concept drift, i.e. the changes in event logs, and identifying the
regions of change in a process.
Solé and Carmona, in [18], describe an incremental approach for translating tran-
sition systems into Petri nets. This translation is performed using Region Theory. The
approach solves the problem of complexity of the translation, by splitting the log into
several parts; applying the Region Theory to each of them and then combine all them.
These regions are finally converted into Petri net.
The above review of the literature shows there no process mining technique for
SPD that address the requirements listed in this section.
The remainder of this paper is organized as follows: Section 2 presents the basic
concepts related to SPD; Section 3 describes the new algorithms designed to tackle
stream process mining; Section 4 reports some details about the implementation of
all the approaches in ProM and Section 5 presents the results of several experiments;
Section 6 concludes the paper. This work contains two appendices: Appendix A sum-
marizes the Heuristics Miner algorithm, Appendix B presents some details on error
bounds.

2 Basic concepts
The main difference between classical process mining [20] and SPD lies in the assumed
input format. For SPD we assume streaming event data that may even come from
multiple sources rather that a static event log containing historic data.
In this paper, we assume that each event, received by the miner, contains the name
of the activity executed, the case id it belongs to, and a timestamp. A formal definition
of these elements is as follows:
Definition 1 (Activity, Case, Time and Event Stream) Let A be a set of activities
and C be a set of case identifiers. An event is a triplet (c, a, t) ∈ C × A × N, i.e.,
the occurrence of activity a for case c (i.e. the process instance) at time t (timestamp of
emission of the event). Actually, in the miner, rather than using an absolute timestamp,
we consider a progressive number representing the number of events seen so far, so
an event at time t is followed by another event at time t + 1, regardless the time lasts
between them. S ∈ (C × A × N)∗ is an event stream, i.e., a sequence of events that

4
are observed item by item. The events in S are sorted according to the order they are
emitted, i.e. the event timestamp.
Starting from this definition, it is possible to define some functions:
Definition 2 (Case time scope) tstart (c) = min(c,a,t)∈S t, i.e. the time when the first
activity for c is observed. tend (c) = max(c,a,t)∈S t, i.e. the time when the last activity
for c is observed.

Definition 3 (Subsequence) Given a sequence of events S ∈ (C × A × N)∗ , it is a


sorted series of events: S = h. . . , si , . . . , si+j , . . . i where si = (c, a, t) ∈ C×A×N. A
subsequence Sij of S is a sequence that identifies the elements of S starting at position
i and finishing at position i + j: Sij = hsi , . . . , si+j i.

In order to relate classical control-flow discovery algorithms with new algorithms


for streams, we can consider an observation period. An observation period O for an
event stream S, is a finite subsequence of S starting at time i and with size j: O = Sij .
Basically, any observation period is a finite subsequence of a stream, and it can be
understood as a classical log file (although the “head” and “tail” of some cases may be
missing). A well-established control-flow discovery algorithm that can be applied to
an observation period log is the Heuristics Miner, whose main features are reported in
Appendix A.

In analogy with classical data streams, an event stream can be defined as stationary
or evolving. In our context, a stationary stream can be seen as generated by a business
process that does not change with time. On the contrary, an evolving stream can be
understood as generated by a process that changes in time. More precisely, different
modes of change can be considered: i) drift of the process model; ii) shift of the process
model; iii) cases (i.e., execution instances of the process) distribution change. Drift
and shift of the process model correspond to the classical two modes of concept drift
[5] in data streams: a drift of the model refers to a gradual change of the underlying
process, while a model shift happens when a change between two process models is
more abrupt. The change in cases distribution represents another way in which an event
stream can evolve, i.e. the original process may stay the same during time, however,
the distribution of the cases is not stationary. With this we mean that the distribution
of the features of the process cases change with time. For example, in a production
process of a company selling clothing, the items involved in incoming orders (i.e., cases
features) during winter will follow a completely different distribution with respect to
items involved in incoming orders during the summer. Such distribution change may
significantly affect the relevance of specific paths in the control-flow of the involved
process.
Going back to process model drift, there is a peculiarity of business event streams
that cannot be found in traditional data streams. An event log records that a specific
activity ai of a business process P has been executed at time t for a specific case
cj . If the drift from P to P 0 happens at time t∗ while the process is running, there
might be cases for which all the activities have been executed within P (i.e., cases
that have terminated their execution before t∗ ), cases for which all the activities have
been executed within P 0 (i.e., cases that have started their execution on or after t∗ ),
and cases that have some activities executed within P and some others within P 0 (i.e.,
cases that have started their execution before t∗ and have terminated after t∗ ). We will
refer to these cases as transient cases. So, under this scenario, the stream will first

5
Mining �me

Mining �me
Time frame considered Time frame considered
{

{
Log used for mining Log used for mining

(a) Periodic reset (b) Sliding window

Figure 2: Two basic approaches for the definition of a finite log out of a stream of
events. The horizontal segments represent the time frames considered for the mining.

emit events of cases executed within P , followed by events of transient cases, followed
by events of cases executed within P 0 . On the contrary, if the drift does not occur
while the process is running, the stream will first report events referring to complete
executions (i.e. cases) of P , followed by events referring to complete executions of P 0
(no transient cases). In any case, the drift is characterized by the fact that P 0 is very
similar to P , i.e. the change in the process which emits the events is limited.
Due to space limitation, we restrict our treatment to stationary streams and streams
with concept drift with no generation of transient cases. The treatment of other scenar-
ios is left for future work.

3 Heuristics Miners for Streams


In this section, we present variants of the Heuristics Miner algorithm (described in
Appendix A) to address the SPD problem under different scenarios. First of all, we
present two basic algorithms where the standard batch version of Heuristics Miner is
used on logs as observation periods extracted from the stream. These algorithms will
be used as a baseline reference for the experimental evaluation. Subsequently, a “fully
online” version of Heuristics Miner, to cope with stationary streams, drift of the process
model with no transient cases, and shift of the process model, is introduced.

3.1 Baseline Algorithm for Stream Mining


The simplest way to adapt the Heuristics Miner algorithm to deal with streams is to
collect events during specific observation periods and then applying the batch version
of the algorithm to the current log. This idea is described by Algorithm 1 in which
two different policies to maintain events in memory are considered. Specifically, an
event e from the stream S is observed (e ← observe(S)) and analyzed (analyze(e))
to decide if the event has to be considered for mining. If this is the case, it is checked
whether there is room in memory to accommodate the event. If the memory is full
(size(M ) = max M ) then the memory policy given as input is adopted. Two different
policies are considered: periodic resets, and sliding windows [2, Ch. 8]. In the case of
periodic resets all the events contained in memory are deleted (reset), while in the case
of sliding windows, only the oldest event is deleted (shift). Subsequently, e is inserted
in memory and it is checked if it is necessary to perform a mining action. If mining has
to be performed, the Heuristics Miner algorithm is executed on the events in memory
(HeuristicsMiner (M )). Graphical representations of the two policies are reported in
Fig. 2.

6
Algorithm 1: Sliding Window HM / Periodic Resets HM
Input: S event stream; M memory of size max M ; PM memory policy (can be
‘reset’ or ‘shift’)
1 forever do
2 e ← observe(S) /* Observe a new event, where
e = (ci , ai , ti ) */
/* Check if event e has to be used */
3 if analyze(e) then
/* Memory update */
4 if size(M ) = max M then
5 if PM is reset then reset(M )
6 if PM is shift then shift(M )
7 end
8 insert(M, e)
/* Mining update */
9 if perform mining then
10 HeuristicsMiner (M )
11 end
12 end
13 end

A potential advantage of the two policies described consists in the possibility to


mine the log not only by using Heuristics Miner, but any process mining algorithm (not
only for control-flow discovery, for example it is possible to extract information about
the social network) already available for traditional batch process discovery techniques.
However, the notion of “history” is not very accurate: only the more recent events are
considered, and an equal importance is assigned to all of them. Moreover, the model
is not updated in real-time since each new event received triggers only the update of
the log, not necessarily an update of the model: performing a model update for each
new event would result in a significant computational burden, well outside the com-
putational limitations assumed for a true online approach. In addition to that, the time
required by these approaches is completely unbalanced: when a new event arrives, only
inexpensive operations are performed; instead, when the model needs to be updated,
the log retained in memory is mined from scratch. So, every event is handled at least
twice: the first time to store it into a log and subsequently any time the mining phase
takes place on it. In an online setting, it is more desirable a procedure that does not
need to process each event more than once (“one pass algorithm” [17]).

3.2 Stream-Specific Approaches


In this section, we suggest how to modify the scheme of the basic approaches, so to
implement a real online framework, the final approach is described in Algorithm 2. In
this framework, the “current” log is described in terms of “latest observed activities”
and “latest observed dependencies”. Specifically, we define three queues:
1. QA , with entries in A × R, stores the most recent observed activities jointly with
a weight for each activity (that represents its degree of importance with respect

7
to mining);
2. QC , with entries in C × A, stores the most recent observed event for each case;
3. QR with entries in A × A × R, stores the most recent observed direct succession
relations jointly with a weight for each succession relation (that represents its
degree of importance with respect to mining).
These queues are used by the online algorithm to retain the information needed to
perform mining.
The detailed description of the new algorithm is presented in Algorithm 2. Specifically,
the algorithm runs forever, considering, at each round, the current observed event
e = (ci , ai , ti ). For each current event, it is checked if ai is already in QA . If this
is not the case, ai is inserted in QA with weight 0. If ai is already present in the queue,
it is removed from its current position and moved at the beginning of the queue. In
any case, before insertion, it is checked if QA is full. If this is the case, the oldest
stored activity, i.e. the last in the queue, is removed. Subsequently, the weights of QA
are updated by fWA . After that, queue QC is examined to look for the most recent
event observed for case ci . If a pair (ci , a) is found, it is removed from the queue, an
instance of the succession relation (a, ai ) is created and searched in QR . If it is found,
it is moved from the current position to the beginning of QR . If it is a new succession
relation, its weight is set to 0. In any case, before insertion, it is checked if QR is
full. If this is the case, the oldest stored relation, i.e. the last in the queue, is removed.
Subsequently, the weights of QR are updated by fWR . Next, after checking if QC is
full (in which case the oldest stored event is removed), the event e is stored in QC .
Finally, it is checked if a model has to be generated. If this is the case, the procedure
generateModel (QA , QR ) is executed taking as input the current version of queues QA
and QR and producing “classical” model representations, such as Causal Nets [21] or
Petri Nets.
Algorithm 2 is parametric with respect to: i) the way weights of queues QA and QR
are updated by fWA , fWR , respectively; ii) how a model is generated by generateModel (QA , QR ).
In the following, generateModel (·, ·) will correspond to the procedure defined by
Heuristics Miner (Appendix A). In particular it is possible to consider QA as the
counter of activities (to filter out only the most frequent ones) and QR as the counter
of direct succession relations, which are used for the computation of the dependency
values between pairs of activities. The following subsections presents some specific
instances for fWA and fWR .

3.2.1 Online Heuristics Miner (Stationary Streams)


In the case of stationary streams, we can reproduce the behavior of Heuristics Miner
as follows. QA should contain, for each activity a, the number of occurrences of a
observed in S till the current time. Similarly, QR should contain, for each succession
(a, b), the number of occurrences of (a, b) observed in S till the current time. Thus
both fWA and fWR must just increment the weight of the first element of the queue:
(
(a, w + 1) if first(QA ) = (a, w)
fWA ((a, w)) =
(a, w) otherwise
(
(a, b, w + 1) if first(QR ) = (a, b, w)
fWR ((a, b, w)) =
(a, b, w) otherwise

8
Algorithm 2: Online HM
Input: S event stream; max QA , max QC , max QR maximum memory sizes for
queues QA , QC , and QR , respectively; fWA , fWR model policy;
generateModel (·, ·).
1 forever do
2 e ← observe(S) /* observe a new event, where
e = (ci , ai , ti ) */
/* check if event e has to be used */
3 if analyze(e) then
4 if 6 ∃(a, w) ∈ QA s.t. a = ai then
5 if size(QA ) = max QA then
6 removeLast(QA ) /* removes last entry of QA
* /
7 end
8 w←0
9 else
10 w ← get(QA , ai ) /* get returns the old weight w
of ai and removes (ai , w) */
11 end
12 insert(QA , (ai , w)) /* inserts in front of QA */
13 QA ← fWA (QA ) /* updates the weights of QA */
14 if ∃(c, a) ∈ QC s.t. c = ci then
15 a ← get(QC , ci ) /* get returns the old activity a
of ci and removes (ci , a) */
16 if 6 ∃(as , af , u) ∈ QR s.t. (as = a) ∧ (af = ai ) then
17 if size(QR ) = max QR then
18 removeLast(QR ) /* removes last entry of QR
*/
19 end
20 u←0
21 else
22 u ← get(QR , a, ai ) /* get returns the old weight
u of relation a → ai and removes (a, ai , u) */
23 end
24 insert(QR , (a, ai , u)) /* inserts in front of QR */
25 QR ← fWR (QR ) /* updates the weights of QR */
26 else if size(QC ) = max QC then
27 removeLast(QC ) /* removes last entry of QC */
28 end
29 insert(QC , (ci , ai )) /* inserts in front of QC */
/* generate model */
30 if model then
31 generateModel (QA , QR )
32 end
33 end
34 end

9
where first(·) returns the first element of the queue.
In case of stationary streams, it is possible to use the Hoeffding bound to derive
error bounds on the measures computed by the online version of Heuristics Miner.
These bounds became tighter and tighter with the increase of the number of processed
events. Appendix B reports some details on that.
It must be noticed that if the sizes of the queues are large enough, the Online Heuris-
tics Miner collects all the needed statistics from the beginning of the stream till the
current time. So it performs very well, provided that the activity distribution of the
stream is stationary. However, in real world business processes it is natural to observe
variations both in events distribution and in the workflow of the process generating the
stream (concept drift).
In order to cope with concept drift, more importance should be given to more recent
events than to older ones. In the following we present a variant of Online Heuristics
Miner able to do that.

3.2.2 Online Heuristics Miner with Aging (Evolving Streams)


The idea, in this case, is to decrease the weights for the events (and relations) over time
when they are not observed. So, every time a new event is observed, only the weight
of its activity (and observed succession) is increased, all the others are reduced. Given
an “aging factor” α ∈ [0, 1), the weight functions fWA (for activities) and fWR (for
succession relations) are modified so to replace all the occurrences of w on the right
side of the equality with αw:
(
(a, (αw) + 1) if first(QA ) = (a, w)
fWA ((a, w)) =
(a, αw) otherwise
(
(a, b, (αw) + 1) if first(QR ) = (a, b, w)
fWR ((a, b, w)) =
(a, b, αw) otherwise
The basic idea of these new functions is to decrease the “history” (i.e., the current
number of observations) by an aging factor α (in the formula: αw) before increasing it
by 1 (the new observation).
These new functions decrease all the weights associated to either an event or a
succession relation according to the aging factor α which determines the “speed” in
forgetting an activity or succession relation, however the most recent observation (the
first in the respective queue) is increased by 1. Notice that, if an activity or succession
relation is not observed for t time steps, its weight becomes αt . Thus the value of α
allows controlling the speed of “forgetting”: the closer α is to 0 the faster the weight
associated to an activity (or succession relation) that has not been observed for some
time goes to 0, thus to allow the miner to assign larger values to recent events. In this
way the miner is more sensitive to sharp variations of the event distribution (concept
shift), however the output (generated models) may be less stable because the algorithm
becomes more sensitive to random fluctuations of the sampling distribution. When the
value of α is close to 1, activities that have not been observed recently, but were seen
more often some time ago, are able to retain their significance, thus allowing the miner
to be able to cope with mild variations of the event distribution (concept drift), but not
so reactive in case of concept shift.
One drawback of this approach is that, while it is able to “forget” old events, it
is not able, at time t, to preserve precise statistics for the last k observations and to

10
completely drop observations occurred before time t − k. This ability could be useful
in case a sudden drastic change in the event distribution.

3.2.3 Online Heuristics Miner with Self-Adapting Aging (Evolving Stream)


The third approach explored in this section introduces α as a parameter to control the
importance of the “history” for the mining: the closer it is to 1, the more importance is
given to the history. The value of α, should be decided according to the known degree
of “non-stationarity” of the stream; however, this information might not be available
or it might not be fixed (for example, the process is stationary for a period, then it
evolves, and then it becomes stationary again). To handle these cases, it is possible to
dynamically adapt the value of α. In particular, the idea is to lower the value of α when
a drift is observed and to increase it when the stream seems to be stationary.
A possible approach to detect the drift is to monitor for variations on the fitness
value. This measure, evaluated at a certain period, can be considered as the amount
of events (considering only the latest ones) that the current mined process is able to
explain. When the fitness value changes drastically, it is likely that a drift has occurred.
Using the drift detection, it is possible to adapt α according to the following rules:
• if the fitness decreases (i.e. there is a drift) α should decreases too (up to 0), in
order to allow the current model to adapt to the new data;

• if the fitness remains unchanged (i.e. it is within a small interval), it means that
there is no drift so the value of α should be increased (up to 1);
• if the fitness increases, α should be increased too (up to 1).
The experiments, presented on the next section, consider only variations of α by a
constant factor. Alternative update policies (e.g. making the speed of change of α
proportional to the observed fitness change) can be considered and is in fact a topic of
future investigations.
Early explorations seem to reveal that the effectiveness of the α update policy heav-
ily depends on the problem type (i.e. characteristics of the event of stream), however
this topic still requires more investigations.

3.3 Stream Process Mining with Lossy Counting (Evolving Stream)


The approach presented in this section is an adaptation of an existing technique, used
for approximate frequency count. In particular, we modified the “Lossy Counting” al-
gorithm described in [14]. We preferred this approach to Sticky Sampling (described in
the same paper) since authors stated that, in practice, Lossy Counting performs better.
The entire procedure is presented in Algorithm 3.
The basic idea of LossyCounting algorithm is to conceptually divide the stream
into buckets of width w = 1 , where  ∈ (0, 1) is an error parameter. The current


bucket (i.e., the bucket of the last element seen) is identified with bcurrent = N
 
w ,
where N is the progressive events counter.
The basic data structure used by Lossy Counting is a set of entries of the form
(e, f, ∆) where: e is an element of the stream; f is the estimated frequency of the item
e; and ∆ is the maximum possible error. Every time a new element e is observed, the
algorithm looks whether the data structure contains an entry for the corresponding ele-
ment. If such entry exists then its frequency value f is incremented by one, otherwise

11
Algorithm 3: Lossy Counting HM
Input: S event stream; N the bucket counter (initially value 1); DA activities set; DC
cases set; DR relations set; generateModel (·, ·).
1
1 w←  /* define the bucket width */
2 forever do
bcurrent = N
 
3 w
/* define the current bucket id */
4 e ← observe(S) /* observe a new event, where e = (ci , ai , ∆i ) */
/* update the DA data structure */
5 if ∃(a, f, ∆) ∈ DA such that a = ai then
6 Remove the entry (a, f, ∆) from DA
7 DA ← (a, f + 1, ∆) /* updates the frequency of element ai
*/
8 else
9 DA ← DA ∪ {(ai , 1, bcurrent − 1)} /* inserts the new
observation */
10 end
/* update the DC data structure */
11 if ∃(c, a, f, ∆) ∈ DC such that c = ci then
12 Remove the entry (c, a, f, ∆) from DC
13 DC ← (c, ai , f + 1, ∆) /* updates the frequency and last
activity of case ci */
/* update the DR data structure */
14 Build relation ri as a → ai
15 if ∃(r, f, ∆) ∈ DR such that r = ri then
16 Remove the entry (r, f, ∆) from DR
17 DR ← (r, f + 1, ∆) /* updates the frequency of element
r i */
18 else
19 DR ← DR ∪ {(ri , 1, bcurrent − 1)} /* adds the new observation
*/
20 end
21 else
22 DC ← DC ∪ {(ci , ai , 1, bcurrent − 1)} /* adds the new observation
*/
23 end
/* periodic cleanup */
24 if N = 0 mod w then
25 foreach (a, f, ∆) ∈ DA such that f + ∆ ≤ bcurrent do
26 Remove (a, f, ∆) from DA
27 end
28 foreach (c, a, f, ∆) ∈ DC such that f + ∆ ≤ bcurrent do
29 Remove (c, a, f, ∆) from DC
30 end
31 foreach (r, f, ∆) ∈ DR such that f + ∆ ≤ bcurrent do
32 Remove (r, f, ∆) from DR
33 end
34 end
35 N ←N +1 /* increments the bucket counter */
/* generate model */
36 if model then
37 generateModel (DA , DR )
38 end
39 end

12
1 <log openxes.version="1.0RC7" xes.features="nested-attributes"
xes.version="1.0" xmlns="http://www.xes-standard.org/">
2 <trace>
3 <string key="concept:name" value="case_id_0" />
4 <event>
5 <date key="time:timestamp" value="2012-04-23T10:33:04.004+02:00"
/>
6 <string key="concept:name" value="A" />
7 <string key="lifecycle:transition" value="Task_Execution" />
8 </event>
9 </trace>
10 </log>
Listing 1: OpenXES fragment streamed over the network.

a new tuple is added: (e, 1, bcurrent − 1). Every time N ≡ 0 mod w, the algorithm
cleans the data structure by removing the entries that satisfy the following inequality:
f + ∆ ≤ bcurrent . Such condition ensures that, every time the cleanup procedure is
executed, bcurrent ≤ N .
This algorithm has been adapted to the SPD problem, using three instances of the
basic data structure. In particular, it counts the frequencies of the activities (with the
data structure DA ) and the frequencies of the direct succession relations (with the data
structure DR ). In order to obtain the relations, a third instance of the same data structure
is used, DC . In DC , each item is of the type (c, a, f, ∆) where c ∈ C represent the case
identifier; f and ∆, as in previous cases, respectively correspond to the frequency and
to the bucket id; and a ∈ A is the latest activity observed on the corresponding case.
Every time a new activity is observed, DA is updated. After that, the procedure checks
if, given the case identifiers of the current event, there is an entry in DC . If this is not
the case a new entry is added to DC (by adding the current case id and the activity
observed). Otherwise, the f and a components of the entry in DC are updated.
The Heuristics Miner can be used to generate the model, since a set of dependencies
between activities is available.

4 Implementation
All the approaches presented into this paper have been implemented in the ProM 6.1
toolkit [26]. Moreover, a “stream simulator” and a “logs merger” have also been im-
plemented to allow for experimentation (to test new algorithms and to compose logs).
Communications between stream sources and stream miner are performed over the
network: each event emitted consists of a “small log” (i.e., a trace which contains ex-
actly one event), encoded as a XES string [8]. An example of an event log streamed
is presented in Listing 1. This approach is useful to simulate “many-to-many environ-
ments” where one source emits events to many miners and one miner can use many
stream sources. The current implementation supports only the first scenario (currently
it is not possible to mine streams generated by more than one source).
Fig. 3 proposes the the set of ProM plugins implemented, and how they interact
each other. The available plugins can be split into two groups: plugins for the simu-
lation of the stream and plugins to mine streaming event data. To simulate a stream
there is the “Log Streamer” plugin. This plugin, receives a static log file as input and
streams each event over the network, according to its timestamp (in this context, times-

13
Online HM

Online HM with Aging

Given two logs


appends one after Logs Merger Online HM w/ Self Adap�ng
the other

Lossy Coun�ng HM

Given a log,
streams events Log Streamer Sliding Windows HM
over the network

Periodic Resets HM

Stream Tester

Figure 3: Architecture of the plugins implemented in ProM and how they interact with
each other. Each rounded box represents a ProM plugin.

tamps are used only to determine the order of events). It is possible to define the time
between each event, in order to test the miner under different emission rates (i.e. to
simulate different traffic conditions). A second plugin, called “Logs Merger” can be
used to concatenate different log files generated by different process models, just for
testing purposes.
Once the stream is active (i.e. events are sent through the network), the clients can
use these data to mine the model. There is a “Stream Tester” plugin, which just shows
the events received. The other 6 plugins support the two basic approaches (Section 3.1),
and the four stream specific approaches (Section 3.3 and 3.2).
In a typical session of testing a new stream process mining algorithm, we expect to
have two separate ProM instances active at the same time: the first is streaming events
over the network and the second is collecting and mining them.
Fig. 4 contains three screenshots of the ProM plugins implemented. The first image,
on top, contains the process streamer: the left bar describes the stream configuration
options (such as the speed or the network port for new connections), the central part
contains a representation of the log as a dotted chart [19] (the x axis represents the
time, and each point with the same timestamp x value is an event occurred at the same
instant). Blue dots are the events that are not yet sent (future events), green ones are
the events already streamed (past events). It is possible to change the color of the
future events so that every event referring to the same activity or to the same process
instance has the same color. The figure in the middle contains the Stream Tester: each
event of a stream is appended to this list, which shows the timestamp of the activity, its
name and its case id. The left bar contains some basic statistics (i.e. beginning of the
streaming session, number of events observed and average number of events observed
per second). The last picture, at the bottom, represents the Online HM miner. This
view can be divided into three parts: the central part, where the process representation
is shown (in this case, as a Causal Net); the left bar contains, on top, buttons to start/stop
the miner plus some basic statistics (i.e., beginning of the streaming session, number
of events observed and average number of events observed per second); at the bottom,
there is a graph which shows the evolution of the fitness measure.

14
Figure 4: Screenshots of four implemented ProM plugins. The first image (top left)
shows the logs merger (it is possible to define the overlap level of the two logs); the
second image (top right) represents the log streamer, the bottom left image is the stream
tester and the image at the bottom right shows the Online HM.

Moreover, Command-Line Interface (CLI) versions of the miners are available too2 .
In these cases, events are read from a static file (one event per line) and the miners
update the model (this implementation realizes an incremental approach of the algo-
rithm). These implementations are can be run in batch and are used for automated
experimentation.

5 Results
The algorithms presented in this paper have been tested using four datasets: event logs
from two artificial processes (one stationary and one evolving); a synthetic example;
and a real event log.

5.1 Models description


The two artificial processes are shown in Fig. 5 and Fig. 6, both are described in terms
of a as Petri Net. The first one describes the complete model (Model 1) that is simulated
to generate the stationary stream. The second one (Model 2) presents the three models
which are used to generate three logs describing an evolving stream. In this case, the
final stream is generated considering the hard shift of the three logs generated from the
single process executions.
The synthetic example (Model 3) is reported in Fig. 7. This example is taken from
[4, Chap. 5] and is expressed as a YAWL [23] process. This model describes a possible
health insurance claim process of a travel agency. This example is modified 4 times so,
2 See http://www.processmining.it for more details.

15
Figure 5: Model 1. Process model used to generate the stationary stream.

Figure 6: Model 2. The three process models that generate the evolving stream. Red
rounded rectangles indicate areas subject to modification.

High
Insurance Check

High Claim Split High Claim Join


High Medical
History Check

By Phone

Decide High/Low Contact Hospital No�fica�on Sent

Prepare By Email
No�fica�on
Low Insurance
Check
By Post
Low Claim Low Claim
Split Join
Register Low Medical Archive
History Check

Skip Response

Create Ques�onnaire Send Ques�onnaire Receive Response

Figure 7: Model 3. The first variant of the third model. Red rounded rectangles indicate
areas that will be subject to the modifications.

16
at the end, the stream contains traces from 5 different processes. Also in this case the
type of drift is shift. Due to space limitation, only the first process is presented and the
red rectangles indicate areas that are modified over time.

5.2 Algorithms evaluation


The streams generated from the described models are used for the evaluation of the
presented techniques. There are various metrics to evaluate the process models with
respect to an event log. Typically four quality dimensions are considered for comparing
model and log: (a) fitness; (b) simplicity; (c) precision; and (d) generalization [20, 22].
In order to measure how well the model describes the log without allowing the reply
of traces not generated by the target process, here we measure the performance both in
terms of fitness (computed according to [1]) and in terms of precision (computed ac-
cording to [16]). The first measure reaches its maximum when all the traces in the log
are properly replied by the model, while the second one prefers models that describe a
“minimal behavior” with respect to all the models that can be generated starting from
the same log. In all experiments, the fitness and precision measures are computed over
the last x observed events (where x varies according to log size), q refers to the maxi-
mum size of queues, and default parameters of Heuristics Miner, for model generation,
are used.
The main characteristics of the three streams are:
• Streams for Model 1: 3448 events, describing 400 cases;
• Streams for Model 2: 4875 events, describing 750 cases (250 cases and 2000
events for the first process model, 250 cases and 1750 events for the second, and
250 cases with 1125 events for the third one);
• Stream for Model 3: 58783 events, describing 6000 cases (1199 cases and 11838
events for the first variant; 1243 cases and 11690 events for the second variant;
1176 cases and 12157 events for the third variant; 1183 cases and 10473 events
for the fourth variant; and 1199 cases and 12625 events for the fifth variant).
We compare the basic approaches versus the different online versions of stream miner,
against the different streams.
Fig. 8 reports the aggregated experimental results for five streams generated by
Model 1. The two charts on top report the averages of the fitness (left) and the vari-
ance (right) for the two basic approaches and the Online HM. The presented values
are calculated varying the size of the window used to perform the mining (in the case
of Online HM it’s the size of the queues), and the number of events used to calculate
the fitness measure (i.e. only the latest x events are supposed to fit the model). For
each combination (number of events for the mining and number of events for fitness
computation) a run of the miner has been executed (for each of the five streams) and
the average and variance values of the fitness (which is calculated every 50 events ob-
served) are reported. It is clear, from the plot, that the Online HM is outperforming
the basic approaches, both in terms of speed in finding a good model and in terms of
fitness of the model itself. The bottom of the figure presents, on the left hand side,
a comparison of the evolution of the average fitness of the Online HM, the HM with
Aging (α = 0.9985 and α = 0.997), the HM with Self Adapting approach and Lossy
Counting. For these runs a queues size of 100 have been used and, for the fitness com-
putation, the latest 200 events are considered. In this case, the lossy counting considers

17
1 Online HM
0.9 1000 0.07

Log size for fitness


0.8 0.06
750 0.05
0.7
0.04
0.6 500 0.03
Fitness

0.5 0.02
250
0.4 0.01
100
50 0
0.3

1000
100
250

500

750
50
0.2
0.1
Sliding Windows HM Periodic Resets HM
0
1000 1000 0.07 0.07
750

Log size for fitness


0.06 0.06
500 750 1000 750
Log size for fitness 250 500 0.05 0.05
100
50 50100 250 0.04 0.04
Window size for mining 500 0.03 0.03
Periodic Resets HM 250
0.02 0.02
Sliding Windows HM 0.01 0.01
100
50 0 0
Online HM

1000

1000
100
250

500

750

100
250

500

750
50

50
Mining window size Mining window size

1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
Fitness

Fitness

0.5 0.5
0.4 0.4
0.3 Online HM 0.3
Online HM w/ Self Adap�ng Aging
0.2 Lossy Coun�ng HM 0.2
0.1 Online HM w/ Aging (α = 0.997) 0.1
Online HM w/ Aging (α = 0.9985)
0 0
1 0 500 1000 1500 2000 2500 3000 3500
Events observed
0.99 α (Online HM w/ Self Adap�ng Aging)
Online HM Periodic Resets HM (q = 100; x = 200)
0 500 1000 1500 2000 2500 3000 3500 Lossy Coun�ng HM Sliding Windows HM (q = 750; x = 750)
Events observed Sliding Windows HM (q = 100; x = 200) Periodic Resets HM (q = 750; x = 750)

Figure 8: Aggregated experimental results for five streams generated by Model 1. Top:
average (left) and variance (right) values of fitness measures for basic approaches and
the Online HM. Bottom: evolution in time of average fitness for Online HM with
queues size 100 and log size for fitness 200; curves for HM with Aging (α = 0.9985
and α = 0.997), HM with Self Adapting (evolution of the α value is shown at the bot-
tom), Lossy Counting and different configurations of the basic approaches are reported
as well.

an error value  = 0.01. The right hand side of Fig. 8 compares the basic approaches,
with different window and fitness sizes against the Online HM and the Lossy Counting
approach. As expected, since there is no drift, the Online HM outperforms the versions
with aging. In fact, HM with aging beside being less stable, degrades performances
as the value of α decreases, i.e. less importance is given to less recent events. This is
consistent with the bad performance reported for the basic approaches which can ex-
ploit only the most recent events contained in the window. The self adapting strategy,
after an initial variation of the α parameter, is able to converge to the Online HM by
eventually choosing a value of α equals to 1.
Fig. 9 reports the aggregated experimental results for five streams generated by
Model 2. In this case we adopted exactly the same experimental setup, procedure
and results presentation as described before. In addition, the occurrences of drift are
marked. As expected, the performance of Online HM decreases at each drift, while
HM with Aging is able to recover from the drifts. The price paid for this ability is a
less stable behavior. HM with Self Adapting aging seems to be the right compromise
being eventually able to recover from the drifts while showing a stable behavior. The α
curve shows that the self adapting strategy seems to be able to detect the concept drifts.

18
1 Online HM
0.9 1000 0.07

Log size for fitness


0.8 0.06
750 0.05
0.7
0.04
0.6 500 0.03
Fitness

0.5 0.02
250
0.4 0.01
100
50 0
0.3

1000
100
250

500

750
50
0.2
0.1
Sliding Windows HM Periodic Resets HM
0
1000 1000 0.07 0.07
750

Log size for fitness


0.06 0.06
500 750 1000 750
Log size for fitness 250 500 0.05 0.05
100
50 50100 250 0.04 0.04
Window size for mining 500 0.03 0.03
Periodic Resets HM 250
0.02 0.02
Sliding Windows HM 0.01 0.01
100
50 0 0
Online HM

1000

1000
100
250

500

750

100
250

500

750
50

50
Mining window size Mining window size

1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
Fitness

Fitness

0.5 0.5
0.4 0.4
0.3 0.3
Dri�s Online HM
0.2 Online HM w/ Self Adap�ng Aging 0.2
Dri�s
0.1 Lossy Coun�ng HM 0.1
Online HM w/ Aging (α = 0.997)
0 0
1 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Events observed
0.99 α (Online HM w/ Self Adap�ng Aging)
Online HM Periodic Resets HM (q = 100, x = 200)
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Lossy Coun�ng HM Sliding Windows HM (q = 750, x = 750)
Events observed Sliding Windows HM (q = 100, x = 200) Periodic Resets HM (q = 750, x = 750)

Figure 9: Aggregated experimental results for five streams generated by evolving


Model 2. Top: average (left) and variance (right) values of fitness measures for ba-
sic approaches and Online HM. Bottom: evolution in time of average fitness for Online
HM with queues size 100 and log size for fitness 200; curves for HM with Aging
(α = 0.997), HM with Self Adapting (evolution of the α value is shown at the bottom),
Lossy Counting and different configurations of the basic approaches are reported as
well. Drift occurrences are marked with vertical bars.

The Model 3, with the synthetic example, has been tested with the basic approaches
(Sliding Windows and Periodic Resets), the Online HM, the HM with Self Adapting
and the Lossy Counting and the results are presented in Fig. 10. In this case, the
Lossy Counting and the Online HM outperform the other approaches. Lossy Counting
reaches higher fitness values, however Online HM is more stable and seems to better
tolerate the drifts. The basic approaches and the HM with Self Adapting, on the other
hand, are very unstable; moreover it is interesting to note that the value of α, of the
HM with Self Adapting, is always close to 1. This indicates that the short stabilities of
the fitness values are sufficient to increase α, so the updating policy (i.e. the incremen-
t/decrement speed of α) presented, for this particular case, seems to be too fast. The
second graph, on the bottom, presents three runs of the Lossy Counting, with differ-
ent values for . As expected, the lower the value of the accepted error, the better the
performances.
Due to the size of this dataset, it is interesting to evaluate the performance of the
approaches also in terms of space and time requirements.
Fig. 11 presents the average memory required by the miner during the processing
of the entire log. Different configurations are tested, both for the basic approaches with

19
1
0.9
0.8
0.7
0.6
Fitness

0.5
0.4
0.3
0.2
0.1
0
1
0.99
0.98 α (Online HM w/ Self Adap�ng Aging)

0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 55000 60000
Events observed
Sliding Windows HM (q = 1000; x = 1000) Online HM w/ Self Adap�ng Aging (q = 1000; x = 1000)
Periodic Resets HM (q = 1000; x = 1000) Lossy Coun�ng HM (ε = 0.01; x = 1000)
Online HM (win. 1000; fit. 1000)

1
0.9
0.8
0.7
0.6
Fitness

0.5
0.4
0.3
0.2
0.1
0
0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 55000 60000
Events observed
Lossy Coun�ng HM (ε = 0.01; x = 1000) Lossy Coun�ng HM (ε = 0.05; x = 1000)
Lossy Coun�ng HM (ε = 0.025; x = 1000)

Figure 10: Detailed results of the basic approaches, Online HM, HM with Self Adapt-
ing and Lossy Counting (with different configurations) on data of Model 3. Vertical
gray lines indicate points where concept drift occur.

the Online HM and the HM with Self Adapting, and the Lossy Counting algorithm.
Clearly, as the windows grow, the space requirement grows too. For what concerns
the Lossy Counting, again, as the  value (accepted error) becomes lower, more space
is required. If we pick the Online HM with window 1000 and the Lossy Counting
with  0.01 (from Fig. 10, both seem to behave similarly) the Online HM consumes
less memory: it requires 128.3 MB whereas the Lossy Counting needs 143.8. Fig. 12
shows the time performance of different algorithms and different configurations. It is
interesting to note, from the chart at the bottom, that the time required by the Online
and the Self Adapting is almost independent of the configurations. Instead, the basic
approaches need to perform more complex operations: the Periodic Reset has to add
the new event and, sometimes, it resets the log; the Sliding Window has to update the
log every time a new event is observed.
In order to study the dependence of the storage requirements of Lossy Counting
with respect to the error parameter , we have run experiments on the same log for
different values of , recording the maximum size of the Lossy Counting sets during
execution. Results for x = 1000 are reported in Fig. 13. Specifically, the figure com-
pares the maximum size of the generated sets, the average fitness value and the average
precision value. As expected, as the value of  becomes larger, both the fitness value

20
150 150
Online HM Lossy Coun�ng HM
Online HM w/ Self Adap�ng Aging
Sliding Windows HM
Space requirement (MB)

125 125
Periodic Resets HM

100 100

75 75

50 50
q, x = 10 q, x = 100 q, x = 500 q, x = 1000 ε = 0.1 ε = 0.05 ε = 0.01
Configura�ons Configura�ons (fitness size set to 1000)

Figure 11: Average memory requirements, in MB, for a complete run over the entire
log of Model 3, of the approaches (with different configurations).
Processing �me per event (ms)

100

10

0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 55000 60000
Events observed

Sliding Windows HM (q = 1000; x = 1000) Online HM w/ Self Adap�ng Aging (q = 1000; x = 1000)
Periodic Resets HM (q = 1000; x = 1000) Lossy Coun�ng HM (ε = 0.01; x = 1000)
Online HM (q = 1000; x = 1000)

14
Online HM
12 Online HM w/ Self Adap�ng Aging
Average processing �me (ms)

Sliding Windows HM
10 Periodic Resets HM

0
q, x = fit. 10 q, x = 100 q, x = 500 q, x = 1000
Configura�ons

Figure 12: Time performances over the entire log of Model 3. Top: time required to
process a single event by different algorithms (logarithmic scale). Vertical gray lines
indicate points where concept drift occur. Bottom: average time required to process an
event over the entire log, with different configurations of the algorithms.

21
1 200
0.9 Queues size 180
0.8 Average fitness 160
Average precision

Maximum sets size


Fitness / Precision 0.7 140
0.6 120
0.5 100
0.4 80
0.3 60
0.2 40
0.1 20
0 0
ε=
ε = 0.01

ε= 5

ε=

ε= 5

ε=

ε=

ε=

ε=
0.

0.

0.

0.

0.

0.

0.

0.
02

05

07

15

25

3
Figure 13: Comparison of the average fitness, precision and space required, with re-
spect to different values of  for the Lossy Counting HM executed on the log generated
by the Model 3.

Online HM

Online HM
with Aging
Counting
Window
Sliding

Lossy
HM

HM

Avg. Time (ms) 4.66 2.61 2.11 1.97


q = 10 Avg. Fitness 0.32 0.28 0.32 0.32
Avg. Precision 0.44 0.87 0.38 0.38
Avg. Time (ms) 5.79 2.85 1.99 1.91
q = 100 Avg. Fitness 0.32 0.51 0.42 0.74
Avg. Precision 0.42 0.65 0.68 0.71

Table 1: Performance of different approaches with queues/sets size of q = 10 and


q = 100 elements and x = 1000. Online HM with Aging uses α1/q = 0.9. Time
values refer to the average number of milliseconds required to process a single event
with respect to Model 3.

and the sets size quickly decrease. The precision value, on the contrary, initially de-
creases and then goes up to very high values. This indicates an over-specialization of
the model to specific behaviors.
As an additional test, we decide to compare the proposed algorithms under extreme
storage conditions which do allow only to retain limited information about the observed
events. Specifically, Table 1 reports the average time required to process a single event,
average fitness and precision values when queues with size 10 and 100, respectively,
are used. For Lossy Counting we have used an  value which approximately requires
sets of similar sizes. Please note that, for this log, a single process trace is longer
than 10 events so, with a queue of 10 elements it is not possible to keep in queue all the
events of a case (because events of different cases are interleaved). From the results it is
clear that, under these conditions, the order of occurrence of the algorithms in the table
(column order) is inversely proportional to all the evaluation criteria (i.e. execution
time, fitness, precision).
The online approaches presented in this work have been tested also against a real

22
1
0.9
0.8
0.7
0.6
Fitness

0.5
0.4
0.3
Online HM (q = 100; x = 200) Online HM w/ Aging (α = 0.998)
0.2 Online HM w/ Self Adap�ng Aging (q = 100; x = 200) Sliding Windows HM (q = 750; x = 750)
0.1 Lossy Coun�ng HM (ε = 0.01; x = 200) Periodic Resets HM (q = 750; x = 750)
0
1
0.99 α (Online HM w/ Self Adap�ng Aging)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000


Events observed

Figure 14: Fitness performance on the real stream dataset by different algorithms.

dataset and results are presented in Fig. 14. The reported results refer to 9000 events
generated from the document management system, by Siav S.p.A., and run on an Ital-
ian bank institute. The observed process contains 8 activities and is assumed to be
stationary. The mining is performed using a queues size of 100 and, for the fitness
computation, the latest 200 events are considered. The behavior of the fitness curves
seems to indicate that some minor drifts occur.
As stated before, the main difference between Online HM and Lossy Counting is
that, whereas the main parameter of Online HM is the size of the queues (i.e. the
maximum space the application is allowed to use), the  parameter of Lossy Counting
cannot control the memory occupancy of the approach. Fig. 15 proposes two com-
parisons of the approaches with two different configurations, against the real stream
dataset. In particular we defined the two configurations so that the average memory
required by Lossy Counting and Online HM are very close. The results presented are
actually the average values over four runs of the approaches. Please note that the two
configurations validates the fitness against different window sizes (in the first case it
contains 200 events, in the second one 1000) and this causes the second configuration
to validate results against a larger history.
The top part of the figure presents a configuration that uses, on average, about
100MB. To obtain this performance, several tests have been made and, at the end, for
Lossy Counting these parameters have been used:  : 0.2, fitness queue size: 200. For
Online HM, the same fitness is used, but the queue size is set to 500. As the plot shows,
it is interesting to note that, in terms of fitness, this configuration is absolutely enough
for the Online HM approach instead, for Lossy Counting, it is not. The second plot, at
the bottom, presents a different configuration that uses about 170MB. In this case, the
error (i.e. ) for Lossy Counting is set to 0.01, the queue size of Online HM is set to
1500 and, for both, the fitness queue size is set to 1000. In this case the two approaches
generate really close results, in terms of fitness.
As final consideration, this empirical evaluation clearly shows that –at least in our
real dataset– both Online HM and Lossy Counting are able to reach very high per-
formances, however the Online is able to better exploit the information available with
respect to the Lossy Counting. In particular, Online HM considers only a finite number
of possible observations (depending on the queue size) that, in this particular case, are
sufficient to mine the correct model. The Lossy Counting, on the contrary, keeps all
the information for a certain time-frame (obtained starting from the error parameter)
without considering how many different behaviors are already seen.
Note on fitness measure The usage of fitness for the evaluation of stream process

23
1
0.9
0.8
0.7
0.6
Fitness

0.5
0.4
0.3
0.2
0.1
0
200
Space requirement (MB)

150
Avgs
100

50

0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Events observed
Lossy Coun�ng HM Online HM

(a) Configuration that requires about 100MB. Lossy Counting:  : 0.2, fitness queue size: 200; Online HM:
queue size: 500, fitness queue size: 200.

1
0.9
0.8
0.7
0.6
Fitness

0.5
0.4
0.3
0.2
0.1
0
350
Space requirement (MB)

300
250
200 Avgs
150
100
50
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Events observed
Lossy Coun�ng HM Online HM

(b) Configuration that requires about 170MB. Lossy Counting:  : 0.01, fitness queue size: 1000; Online
HM: queue size: 1500, fitness queue size: 1000.

Figure 15: Performances comparison between Online HM and Lossy Counting, in


terms of fitness and memory consumption.

mining algorithms seems to be an effective choice. However, this might not always be
the case: let’s consider two very different processes P 0 and P 00 and a stream composed
of events generated by alternate executions of P 0 and P 00 . Under specific conditions,
the stream miner will generate a model that contains both P 0 and P 00 , connected by
an initial XOR-split and merged with a XOR-join. This model will have a very high
fitness value (it can replay traces from both P 0 and P 00 ), however the mined model is
not the one expected, i.e. the alteration in time of P 0 and P 00 is not reflected well.
In order to deal with the problem just presented, we propose the performances
of some approaches also in terms of “precision”. This measure is thought to prefer
models that describe a “minimal behavior” with respect to all the model that can be
generated starting from the same log. In particular we used the approach by Muñoz-

24
1
0.9
0.8
0.7
0.6
Precision

0.5
0.4
0.3
0.2 Online HM (q = 1000; x = 2000) Sliding Windows HM (q = 1000; x = 2000)
Online HM w/ Self Adap�ng Aging (q = 1000; x = 2000) Periodic Resets HM (q = 1000; x = 2000)
0.1 Lossy Coun�ng HM (ε = 0.01; x = 2000)
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Events observed

Figure 16: Precision performance on the real stream dataset by different algorithms.

Gama and Carmona described in [16]. Fig. 16 presents the precision calculated for four
approaches during the analysis of the dataset of real events. It should not surprise to no-
tice that the stream specific approaches reach very good precision values, whereas the
basic approach with periodic reset needs to recompute, every 1000 events, the model
from scratch. It is interesting to note that both Online HM and Lossy Counting are not
able to reach the top values, whereas the Self adapting one, after some time, reaches
the best precision, even if its value fluctuates a bit. The basic approach with sliding
window, instead, seems to behave quite nicely, even if the stream specific approaches
outperform it.

6 Conclusions and Future Work


In this paper, we addressed the problem of discovering processes for streaming event
data having different characteristics, i.e. stationary streams and streams with drift.
First, we considered basic window-based approaches, where the standard Heuris-
tics Miner algorithm is applied to statics logs obtained by using a moving window on
the stream (we considered two different policies). Then we introduced a framework for
stream process mining which allows the definition of different approaches, all based
on the dependencies between activities. These can be seen as online versions of the
Heuristics Miner algorithm and differentiate from each other in the way they assign
importance to the observed events. The Online HM, an incremental version of the
Heuristics Miner, gives the same importance to all the observed events, and thus it is
specifically apt to mine stationary streams. HM with Aging gives less importance to
older events. This is obtained by weighting the statistics of an event by a factor, the
α value, which exponentially decreases with the age of the event. Because of that,
this algorithm is able to cope with streams exhibiting concept drift. The choice of the
“right” value for α, however, is difficult and different values for α could also be needed
at different times. To address this issue, we finally introduce Heuristics Miner able to
automatically adapt the aging factor on the basis of the detection of concept drift (HM
with Self Adapting). Finally, we adapted a standard approach (Lossy Counting) to our
problem.
Experimental results on artificial, synthetic and real data show the efficacy of the
proposed algorithms with respect to the basic approaches. Specifically, the Online HM
turns out to be a quite stable and performs well for streams, especially when station-
ary streams are considered, while HM with Self Adapting aging factor and the Lossy

25
Counting seem to be the right choice in case of concept drift. The largest log has been
used also for measuring performance in terms of time and space requirements.
As future work, we plan to conduct a deeper analysis of the influence of the differ-
ent parameters on the presented approaches. Moreover, we plan to extend the current
approach also to mine the organizational perspective of the process. Finally, from a
process analyst point of view, it may be interesting to not only show the current up-
dated process model, but also report the “evolution points” of the process.

References
[1] Arya Adriansyah, Boudewijn van Dongen, and Wil M. P. van der Aalst. Confor-
mance Checking Using Cost-Based Fitness Analysis. In 2011 IEEE 15th Interna-
tional Enterprise Distributed Object Computing Conference, pages 55–64. IEEE,
August 2011.
[2] Charu Aggarwal. Data Streams: Models and Algorithms, volume 31 of Advances
in Database Systems. Springer US, Boston, MA, 2007.
[3] Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernard Pfahringer. MOA:
Massive Online Analysis Learning Examples. Journal of Machine Learning Re-
search, 11:1601–1604, 2010.
[4] R. P. Jagadeesh Chandra Bose. Process Mining in the Large: Preprocessing, Dis-
covery, and Diagnostics. Phd thesis, Technische Universiteit Eindhoven, 2012.
[5] R. P. Jagadeesh Chandra Bose, Wil M. P. van der Aalst, Indr Žliobait, and Mykola
Pechenizkiy. Handling Concept Drift in Process Mining. In Conference on
Advanced Information Systems Engineering (CAiSE), pages 391–405. Springer
Berlin / Heidelberg, 2011.
[6] Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy. Mining
Data Streams: a Review. ACM Sigmod Record, 34(2):18–26, June 2005.
[7] Lukasz Golab and M. Tamer Özsu. Issues in Data Stream Management. ACM
SIGMOD Record, 32(2):5–14, June 2003.
[8] Christian W. Günther. XES Standard Definition. www.xes-standard.org, 2009.
[9] Wassily Hoeffding. Probability Inequalities for Sums of Bounded Random Vari-
ables. Journal of the American Statistical Association, 58(301):13–30, 1963.
[10] IEEE Task Force on Process Mining. Process Mining Manifesto. In Florian
Daniel, Kamel Barkaoui, and Schahram Dustdar, editors, Business Process Man-
agement Workshops, pages 169–194. Springer-Verlag, 2011.
[11] Andre Cristiano Kalsing, Gleison Samuel do Nascimento, Cirano Iochpe, and
Lucineia Heloisa Thom. An Incremental Process Mining Approach to Extract
Knowledge from Legacy Systems. In 2010 14th IEEE International Enterprise
Distributed Object Computing Conference, pages 79–88. IEEE, October 2010.
[12] Ekkart Kindler, Vladimir Rubin, and Wilhelm Schäfer. Incremental Workflow
Mining Based on Document Versioning Information. In International Software
Process Workshop, pages 287–301. Springer Verlag, 2005.

26
[13] Ekkart Kindler, Vladimir Rubin, and Wilhelm Schäfer. Incremental Workflow
Mining for Process Flexibility. In Proceedings of BPMDS2006, pages 178–187,
2006.

[14] Gurmeet Singh Manku and Rajeev Motwani. Approximate Frequency Counts
over Data Streams. In Proceedings of International Conference on Very Large
Data Bases, pages 346–357, Hong Kong, China, 2002. Morgan Kaufmann.
[15] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs,
Charles Roxburgh, and Angela Hung Byers. Big Data: The Next Frontier for
Innovation, Competition, and Productivity. Technical Report June, McKinsey
Global Institute, 2011.
[16] Jorge Muñoz Gama and Josep Carmona. A fresh look at Precision in Process
Conformance. In Business Process Management, pages 211–226. Springer Berlin
/ Heidelberg, 2010.

[17] Nicole Schweikardt. Short-Entry on One-Pass Algorithms. In Ling Liu and


M. Tamer Öszu, editors, Encyclopedia of Database Systems, pages 1948–1949.
Springer-Verlag, 2009.
[18] Marc Solé and Josep Carmona. Incremental Process Mining. In Proceedings of
ACSD/Petri Nets Workshops, pages 175–190, 2010.
[19] Minseok Song and Wil M. P. van der Aalst. Supporting Process Mining by Show-
ing Events at a Glance. In Workshop on Information Technologies and Systems
(WITS), pages 139–145, 2007.
[20] Wil M. P. van der Aalst. Process Mining: Discovery, Conformance and En-
hancement of Business Processes. Springer Berlin Heidelberg, Berlin, Heidel-
berg, 2011.
[21] Wil M. P. van der Aalst, Arya Adriansyah, and Boudewijn van Dongen. Causal
Nets: A Modeling Language Tailored towards Process Discovery. In CONCUR -
Concurrency Theory, pages 28–42. Springer Verlag, 2011.

[22] Wil M. P. van der Aalst, Arya Adriansyah, and Boudewijn van Dongen. Replaying
History on Process Models for Conformance Checking and Performance Analy-
sis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,
2(2):182–192, March 2012.

[23] Wil M. P. van der Aalst and Arthur H.M. ter Hofstede. YAWL: Yet Another
Workflow Language. Information Systems, 30(4):245–275, June 2005.
[24] Wil M. P. van der Aalst and Ton A. J. M. M. Weijters. Rediscovering Workflow
Models from Event-based Data Using Little Thumb. Integrated Computer-Aided
Engineering, 10(2):151–162, 2003.

[25] Matthijs van Leeuwen and Arno Siebes. StreamKrimp: Detecting Change in
Data Streams. In Walter Daelemans, Bart Goethals, and Katharina Morik, editors,
Machine Learning and Knowledge Discovery in Databases, volume LNCS 5211
of LNAI, pages 672–687. Springer, 2008.

27
[26] Eric H. M. W. Verbeek, Joos Buijs, Boudewijn van Dongen, and Wil M. P. van der
Aalst. ProM 6: The Process Mining Toolkit. In BPM 2010 Demo, pages 34–39,
2010.
[27] Gerhard Widmer and Miroslav Kubat. Learning in the Presence of Concept Drift
and Hidden Contexts. Machine Learning, 23(1):69–101, 1996.

A Heuristics Miner
A.1 Heuristics Miner metrics
Heuristics Miner (HM) [24] is a process mining algorithm that counts various types of
frequencies to mine dependency relations among activities represented by logs.
The relation a >W b holds iff there is a trace σ = ht1 , t2 , . . . , tn i and i ∈
{1, . . . , n − 1} such that ti = a and ti+1 = b. The notation |a >W b| indicates to
the number of times that, in W , a >W b holds (no. of times activity b directly follows
activity a).
The following subsections present a detailed list of all the formulae required by
Heuristics Miner to build a process model.

A.1.1 Dependency Relations (⇒)


An edge (that usually represents a dependency relation) between two activities is added
if its dependency measure is above the value of the dependency threshold. This relation
is calculated, between activities a and b, as:
|a >W b| − |b >W a|
a ⇒W b = (1)
|a >W b| + |b >W a| + 1
The rationale of this rule is that two activities are in a dependency relation if most of
times they are in the specifically required order.

A.1.2 AND/XOR Relations (∧, ⊗)


When an activity has more than one outgoing edge, the algorithm has to decide whether
the outgoing edges are in AND or XOR relation (i.e. the “type of split”). Specifically,
it has to calculate the following quantity:
|b >W c| + |c >W b|
a ⇒W (b ∧ c) = (2)
|a >W b| + |a >W c| + 1
If this quantity is above a given AND threshold, the split is an AND-split, otherwise it
is considered to be in XOR relation. The rationale, in this case, is that two activities
are in an AND relation if most of times they are observed in no specific order (so one
before the other and vice versa).

A.1.3 Long Distance Relations (⇒l )


Two activities a and b are in a “long distance relation” if there is a dependency between
them, but they are not in direct succession. This relation is expressed by the formula:
|a ≫W b|
a ⇒lW b = (3)
|b| + 1

28
Figure 17: Example of a possible process model that generates the log W .

where |a ≫W b| indicates the number of times that a is directly or indirectly (i.e.


if there are other different activities between a and b) followed by b in the log W . If
this formula’s value is above a long distance threshold, then a long distance relation is
added into the model.

A.1.4 Loops of Length one and two


A loop of length one (i.e. a self loop on the same activity) is introduced if the quantity:

|a >W a|
a ⇒W a = (4)
|a >W a| + 1
is above a length-one loop threshold. A loop of length two is considered differently: it
is introduced if the quantity:

|a >2W b| + |b >2W a|
a ⇒2W b = (5)
|a >2W b| + |b >2W a| + 1

is above a length-two loop threshold. In this case, the a >2W b relation is observed when
a is directly followed by b and then there is a again (i.e. for trace σ = ht1 , t2 , . . . , tn i
there is an i ∈ {1, . . . , n − 2} such that σ ∈ W and ti = a and ti+1 = b and ti+2 = a).

A.2 Running Example


Let’s consider the process model shown in Fig. 17. Given the set of activities {A, B1 , B2 , C, D},
a possible log W , with 10 process instances, is:

W = hA, B1 , B2 , C, Di5 ; hA, B2 , B1 , C, Di5




Please note that the notation h· · · in indicates n case following of the same sequence.
Such log can be generated starting from executions of the process model of Fig. 17.
In the case reported in figure, the main measure (dependency relation) builds the
following relation:

A B1 B2 C D
 
A 0 0.83 0.83 0 0
B1 
 −0.83 0 0 0.83 0  
 −0.83
B2  0 0 0.83 0  
C  0 −0.83 −0.83 0 0.909 
D 0 0 0 −0.909 0

Starting from this relation and considering – for example – a value 0.9 for the depen-
dency threshold, it is possible to identify the complete set of dependencies, including

29
the split from activity A to B1 and B2 . In order to identify the type of the split it is
necessary to use the AND measure (Eq. (2)):
5+5
A ⇒W (B1 ∧ B2 ) = = 0.909
5+5+1
So, considering — for example — an AND-threshold of 0.1, the type of the split is
set to AND. In the ProM implementation, the default value for dependency threshold
is 0.9, and for the AND-threshold it is 0.1.

B Error Bounds on Online Heuristics Miner


If we assume a stationary stream, i.e. a stream where the distribution of events does
not change with time (no concept drift), then it is possible to give error bounds on the
measures computed by the online version of Heuristics Miner.
In fact, let consider an execution of the online Heuristics Miner on the stream S.
Let QA (t), QC (t), and QR (t) be the content of the queues used by Algorithm 2 at
time t. Let case overlap (t) = {c ∈ C | tstart (c) ≤ t ∧ tend (c) ≥ t} be the set of
cases that are active at time t; ∆c = maxt |case overlap (t)|; nc (t) be the cumulative
number of cases which have been removed from QC (t) during the time interval [0, t];
and nc(t) = |QC (t)| + nc (t). Given two activities a and b, let ρab ∈ [0, ξab ] be the
random variable reporting the number of successions (a, b) contained in a randomly
selected trace in S. With AS and RS we denote the set of activities and successions,
respectively, observed for the entire stream S. Then it is possible to state the following
theorem:
Theorem 1 (Error bounds) Let (a ⇒S b), a ⇒S (b ∧ c), be the measures computed
by the Heuristics Miner algorithm on a time-stationary stream S, and (a ⇒S0t b),
a ⇒S0t (b ∧ c), be the measures computed at time t by the online version of the Heuris-
tics Miner algorithm on the stream S. If maxA ≥ |AS |, maxR ≥ |RS |, maxC ≥ ∆c ,
then with probability 1 − δ the following bounds hold:
!
E[ρab + ρba ]
(a ⇒S b) 1 −
E[ρab + ρba ] + ab (t) + nc(t)
ab (t)
1 ≤ (a ⇒S0t b)
E[ρab + ρba ] + ab (t) + nc(t)

!
E[ρab + ρba ]
(a ⇒S0t b) ≤ (a ⇒S b) 1 +
E[ρab + ρba ] − ab (t) + nc(t)
ab (t)
1
E[ρab + ρba ] − ab (t) + nc(t)

And, similarly, for a ⇒ (b ∧ c):


!
E[ρbc + ρcb ]
(a ⇒S (b ∧ c)) 1 −
E[ρab + ρac ] + abc (t) + nc(t)
bc (t)
1 ≤ (a ⇒S0t (b ∧ c))
E[ρab + ρac ] + abc (t) + nc(t)

30
!
E[ρbc + ρcb ]
(a ⇒S0t (b ∧ c)) ≤ (a ⇒S (b ∧ c)) 1 +
E[ρbc + ρcb ] − abc (t) + nc(t)
bc (t)
1
E[ρab + ρac ] − abc (t) + nc(t)
q q
(ξde +ξed )2 ln(2/δ) (ξde +ξdf )2 ln(2/δ)
where ∀d, e, f ∈ AS , de (t) = 2nc(t) , def (t) = 2nc(t) ,
and E[x] is the expected value of x.
|a>S b|−|b>S a|
Proof 1 Let consider the Heuristics Miner definition (a ⇒S b) = |a> S b|+|b>S a|+1
(as
presented in Eq. (1)). Let Nc be the number of cases contained in S0t , then
|a>S t b|−|b>S t a|
|a >S0t b| − |b >S0t a| 0
Nc
0

(a ⇒S0t b) = = |a>S t b|+|b>S t a|


|a >S0t b| + |b >S0t a| + 1 1
0
Nc
0
+ Nc

and
|a>S t b|−|b>S t a|
0 0
Nc E[ρab − ρba ]
(a ⇒S b) = lim = .
Nc →+∞ |a>S0t b|+|b>S0t a| 1 E[ρab + ρba ]
Nc + Nc
|a>S t b|−|b>S t a|
We recall that X = 0
Nc
0
is the mean of the random variable X = (ρab −
ρba ) computed over Nc independent observations, i.e. traces, and that X ∈ [−ξba , ξab ].
We can then use the Hoeffding bound [9] that states that, with probability 1 − δ
s
2
2

X − E[X] < X = rX ln δ ,

2Nc

where rX is the range of X, which in our case is rX = (ξab + ξba ).


By using the Hoeffding bound also for the variable Y = (ρab + ρba ), we can state
that with probability 1 − δ

E[X] − X X
1 ≤ = (a ⇒S0t b),
E[Y ] + Y + Nc Y + N1c

which after some algebra can be rewritten as


!
E[X] E[Y ] X
− ≤ (a ⇒S0t b)
E[Y ] E[Y ] + Y + N1c E[Y ] + Y + 1
Nc

By observing that (a ⇒S b) = E[X]E[Y ] , rX = rY = (ξab + ξba ), and that at time t, under


the theorem hypotheses, no information is removed from the queues and Nc = nc(t),
the first bound is proved. The second bound can be proved starting from
E[X] + X
(a ⇒S0t b) ≤ .
E[Y ] − Y + N1c

The last two bounds can be proved in a similar way by considering X = (ρbc +
q ∈ [0, ξbc + ξcb ] and Y q
ρ cb ) = (ρab + ρac ) ∈ [0, ξab + ξac ], which leads to X =
(ξbc +ξcb )2 ln(2/δ) )2 ln(2/δ)
2Nc and Y = (ξab +ξac
2Nc .

31
Similar bounds can be obtained also for the other measures computed by Heuristics
Miner. From the bounds it is possible to see that, with the increase of the number
1
of observed cases nc(t), both nc(t) and the errors ab (t) and abc (t) go to 0 and the
measures computed by the online version of Heuristics Miner consistently converge to
the “right” values.

32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy