0% found this document useful (0 votes)
49 views32 pages

Course Code: IS423 Course Name: Business Process Mining: Presented By: Dr. Iman Helal

This document provides information for the course "Business Process Mining" presented by Dr. Iman Helal of Cairo University's Information Systems Department. It lists the course code and name, and provides a link to the textbook "Process Mining: Data Science in Action, 2nd Edition" as well as the course Blackboard access code. The course outline covers an introduction to process mining, background topics like event logs and data mining, process discovery, conformance checking, and operational support using process mining.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views32 pages

Course Code: IS423 Course Name: Business Process Mining: Presented By: Dr. Iman Helal

This document provides information for the course "Business Process Mining" presented by Dr. Iman Helal of Cairo University's Information Systems Department. It lists the course code and name, and provides a link to the textbook "Process Mining: Data Science in Action, 2nd Edition" as well as the course Blackboard access code. The course outline covers an introduction to process mining, background topics like event logs and data mining, process discovery, conformance checking, and operational support using process mining.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Course Code: IS423

Course Name: Business Process Mining

Presented by:
Dr. Iman Helal

Information Systems Department


Faculty of Computers and Artificial Intelligence
Cairo University, Egypt
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

Before lecture

Process Mining: Data Science in Action, 2nd Edition


Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Free
• Book:
Book Link

(2nd edition). Springer.


access to the book through ekb.eg
Process Mining: Data Science in Action

Process Mining: Data Science in Action, 2nd Edition


Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Access

• Course
• Course
ID
code
Access to Blackboard

124487
212201.FCI.IS423
Name Business Process Mining

Process Mining: Data Science in Action, 2nd Edition


Outline
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Introduction and Overview


• Process Mining
• Background (as necessary)
o Event Logs
o Data Mining
• Process Discovery

Process Mining: Data Science in Action, 2nd Edition


• Conformance Checking
• Operational Support …
• Process Mining in the Large (if we have time)

5
Chapter 1:
Data Science in Action

Process Mining: Data Science in Action, 2nd Edition


Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

Data is GOLD
Data is the new OIL

Process Mining: Data Science in Action, 2nd Edition


Internet of Events
From Bits to Zettabytes
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• A “bit” is the smallest unit of information possible.


• One bit has two possible values: 1 (on) and 0 (off).
Moore’s law
• A “byte” is composed of 8 bits and can represent 28 = 256 values.
• To talk about larger amounts of data, multiples of 1000 are used:
o 1 Kilobyte (KB) equals 1000 bytes,
o 1 Megabyte (MB) equals 1000 KB,
o 1 Gigabyte (GB) equals 1000 MB,
o 1 Terabyte (TB) equals 1000 GB,
o 1 Petabyte (PB) equals 1000 TB,

Process Mining: Data Science in Action, 2nd Edition


o 1 Exabyte (EB) equals 1000 PB, and
o 1 Zettabyte (ZB) equals 1000 EB.
o Hence, 1 Zettabyte is 1021 = 1,000,000,000,000,000,000,000 bytes.

• Note that here we used the International System of Units (SI) set of unit prefixes, also known as SI
prefixes, rather than binary prefixes. If we assume binary prefixes, then 1 Kilobyte is 210 = 1024
bytes, 1 Megabyte is 220 = 1048576 bytes, and 1 Zettabyte is 270 ≈ 1.18×1021 bytes. 8
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

Moore’s law
Internet of Events

Process Mining: Data Science in Action, 2nd Edition


https://evobsession.com/tesla-has-done-something-no-other-automaker-has-assumed-the-mantle-of-moores-law/
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

Internet of Events (IoE)

10

Process Mining: Data Science in Action, 2nd Edition


Internet of Events (IoE)
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Oneof the main challenges of today’s organizations is to extract information and


value from data stored in their information systems.
• IoE: Event data are generated from a variety of sources connected to the Internet
o Internet of Content (IoC),
▪ all information created by humans to increase knowledge on particular subjects.
▪ includes traditional web pages, articles, encyclopedia like Wikipedia, YouTube, e-books,
newsfeeds, etc.
o Internet of People (IoP),
▪ all data related to social interaction.

Process Mining: Data Science in Action, 2nd Edition


▪ includes e-mail, Facebook, Twitter, forums, LinkedIn, etc.
o Internet of Things (IoT),
▪ all physical objects connected to the network.
▪ includes all things that have a unique id and a presence in an Internet-like structure.
o Internet of Locations (IoL)
▪ refers to all data that have a geographical or geospatial dimension.
▪ With the uptake of mobile devices (e.g., smartphones) more and more events have location
or movement attributes. 11
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

Internet of Events (IoE)

12

Process Mining: Data Science in Action, 2nd Edition


Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

Internet of Events (IoE)

14

Process Mining: Data Science in Action, 2nd Edition


Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

"Big" to be challenging!
Data does not have to be

Need for data scientists!


are everywhere!
Data analytics questions

15

Process Mining: Data Science in Action, 2nd Edition


Data Science
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Data science is an interdisciplinary field aiming to turn data into real value.
• Data may be structured or unstructured, big or small, static or streaming.
• Value may be provided in the form of predictions, automated decisions,
models learned from data, or any type of data visualization delivering
insights.
• Data science includes data extraction, data preparation, data exploration,

Process Mining: Data Science in Action, 2nd Edition


data transformation, storage and retrieval, computing infrastructures,
various types of mining and learning, presentation of explanations and
predictions, and the exploitation of results taking into account ethical,
social, legal, and business aspects.

16
Data Scientist
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

A data scientist can answer a variety of data-driven


questions.
• (Reporting) What happened?
• (Diagnosis) Why did it happen?
• (Prediction) What will happen?

Process Mining: Data Science in Action, 2nd Edition


• (Recommendation) What is the best that can happen?

17
Data Scientist
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Example from Healthcare


o Why do patients have to wait so long?
o Do doctors follow the guidelines?
o Can we predict waiting times?
o How much staff is needed tomorrow?
o How can we reduce costs?

• Example from Philips X-ray machines:


o How are X-ray machines really used?

Coursera course: Process Mining: Data Science in Action


o Why and when do X-ray machines

Process Mining: Data Science in Action, 2nd Edition


malfunction?
o Which components should be replaced?
o Can we predict that the machine will
break down next week?
o Which parts need to be improved?

18
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

Ingredients
Data Science

Process
Business

Intelligence

19

Process Mining: Data Science in Action, 2nd Edition


Data Science Ingredients
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Statistics is the origin of data science. The discipline is typically split into
o descriptive statistics (to summarize sample data using notions like mean, standard deviation,
and frequency) and
o inferential statistics (using sample data to estimate characteristics of all data or to test a
hypothesis).
• Algorithms are crucial in any approach analyzing data.
o When data sets get larger, the complexity of the algorithms becomes a primary concern.
o For example, the Apriori algorithm for finding frequent items sets, the MapReduce approach
for parallelizing algorithms, and the PageRank algorithm used by Google search.

Process Mining: Data Science in Action, 2nd Edition


• Data mining can be defined as “the analysis of (often large) data sets to find
unsuspected relationships and to summarize the data in novel ways that are both
understandable and useful to the data owner”
o The input data are typically given as a table and the output may be rules, clusters, tree
structures, graphs, equations, patterns, etc.
o It builds on statistics, databases, and algorithms.
o Compared to statistics, the focus is on scalability and practical applications.
20
Data Science Ingredients
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Machine learning (ML)


o It is concerned with the question of how to construct computer programs that automatically
improve with experience
o The difference between data mining and machine learning is equivocal (unclear/ambiguous).
o The field of machine learning emerged from within Artificial Intelligence (AI) with
techniques such as neural networks.
o We use the term ML to refer to algorithms that give computers the capability to learn without
being explicitly programmed (“learning from experience”).
o To learn and adapt, a model is built from input data (rather than using fixed routines). The

Process Mining: Data Science in Action, 2nd Edition


evolving model is used to make data-driven predictions or decisions.
• Process mining adds the process perspective to machine learning and data mining.
o Process mining seeks the confrontation between event data (i.e., observed behavior) and
process models (hand-made or discovered automatically).
o Event data are related to explicit process models, e.g., Petri nets or BPMN models.
o For example, process models are discovered from event data or event data are replayed on
models to analyze compliance and performance. 21
Data Science Ingredients
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Predictive analytics is the practice of extracting information from existing data sets in order to
determine patterns and predict future outcomes and trends.
o To generate predictions, existing mining and learning approaches are applied in a business context.
o Predictive analytics is related to business analytics and business intelligence.
• Databases are used to store data.
o It is one of the cornerstones of data science.
o DBMSs serve two purposes:
▪ (i) structuring data so that they can be managed easily and
▪ (ii) providing scalability and reliable performance.
o Using database technology, application programmers do not need to worry about data storage.
o Until recently, relational databases and SQL (Structured Query Language) were the norm.
o Due to the growing volume of data, massively distributed databases and so-called NoSQL databases emerged.
o Moreover, in-memory computing (cf. SAP HANA) can be used to answer questions in real-time.

Process Mining: Data Science in Action, 2nd Edition


o Related is OLAP (Online Analytical Processing) were data are stored in multidimensional cubes facilitating
analysis from different points of view.
• Distributed systems provide the infrastructure to conduct analysis.
o It is composed of interacting components that coordinate their actions to achieve a common goal.
o Cloud, grid, and utility computing rely on distributed systems.
o Some analysis tasks are too large or too complex to be performed on a single computer.
o Such tasks can be split into many smaller tasks that can be performed concurrently on different computing
nodes.
o Scalability may be realized by sharing and/or extending the set of computing nodes.
22
Data Science Ingredients
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Visualization & visual analytics are key elements of data science.


o In the end people need to interpret the results and guide analysis.
o Automated learning and mining techniques can be used to extract knowledge from data.
o However, if there are many “unknown unknowns” (things we don’t know we don’t know), analysis heavily relies on
human judgment and direct interaction with the data.
o The perception capabilities of the human cognitive system can be exploited by using the right visualizations.
o Visual analytics combines automated analysis techniques with interactive visualizations for an effective understanding,
reasoning and decision making on the basis of very large and complex data sets.

• Business models & marketing


o Data science is about turning data into value, including business value.
o The market capitalization of Facebook in November 2015 was approximately US $300 billion while having
approximately 1500 million monthly active users. Hence, the average value of a Facebook user was US $200.
o At the same time, the average value of a Twitter user was US $55 (market capitalization of approximately US $17
billion with 307 million users). Via the website www.twalue.com one can even compute the value of a particular

Process Mining: Data Science in Action, 2nd Edition


Twitter account. In November 2015, the author’s Twitter account (@wvdaalst) was estimated to have a value of US
$1002.98. These numbers illustrate the economic value of data and the success of young companies based on new
business models.
▪ Airbnb (helping people to list, find and rent lodging),
▪ Uber (connecting travelers and drivers who use their own cars), and
▪ Alibaba (an online business-to-business trading platform) are examples of data-driven companies that are radically
changing the hotel, taxi, and trading business.
o Marketing is also becoming more data-driven.
o Data scientists should understand how business considerations are driving the analysis of new types of data.
23
Data Science Ingredients
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Behavioral/social science
o most data are (indirectly) generated by people and analysis results are often used to influence people (e.g., guiding the
customer to a product or encouraging a manager to eliminate waste).
o It is the systematic analysis and investigation of human behavior.
o Social sciences study the processes of a social system and the relationships among individuals within a society.
o To interpret the results of various types of analytics, it is important to understand human behavior and the social
context in which humans and organizations operate.
o Moreover, analysis results often trigger questions related to coaching and positively influencing people.

• Privacy, security, law, and ethics are key ingredients to protect individuals and organizations from “bad”
data science practices.
o Privacy refers to the ability to seclude sensitive information.
▪ Privacy often depends on security mechanisms which aim to ensure the confidentiality, integrity and availability of
data.

Process Mining: Data Science in Action, 2nd Edition


▪ Data should be accurate and stored safely, not allowing for unauthorized access.
o Privacy and security need to be considered carefully in all data science applications.
▪ Individuals need to be able to trust the way data are stored and transmitted.
▪ Next to concrete privacy and security breaches, there may be ethical notions related to “good” and “bad” conduct.
o Not all types of analysis possible are morally defendable.
▪ For example, mining techniques may favor particular groups (e.g., a decision tree may reveal that it is better to give
insurance to middle-aged white males rather than other groups).
▪ Moreover, due to a lack of sufficient data, minority groups may be wrongly classified.
▪ A data scientist should be aware of such problems and provide safeguards for “irresponsible” forms of data science. 24
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

Bridging the Gap between


Process Science and Data Science

25

Process Mining: Data Science in Action, 2nd Edition


Bridging the Gap between
Process Science and Data Science
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Stochastics provides a repertoire of techniques to analyze random


processes.
o The behavior of a process or system is modeled using random variables in order to allow
for analysis.
o Well-known approaches include Markov models, queueing networks/systems, and
simulation.
o These can be used to analyze waiting times, reliability, utilization, etc. in the context
stochastic processes.
• Optimization techniques aim to provide a “best” alternative (e.g., cheapest

Process Mining: Data Science in Action, 2nd Edition


or fastest) from a large or even infinite set of alternatives.
o For example, Given a list of cities and the distances between each pair of cities, what is a
shortest possible route that visits each city exactly once and returns to the origin city?
o Numerous optimization techniques have been developed to answer such questions as
efficient as possible.
o Well-known approaches include Linear Programming (LP), Integer Linear Programming
(ILP), constraint satisfaction, and dynamic programming.
26
Bridging the Gap between
Process Science and Data Science
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Operations
management & research deals with the design, control and
management of products, processes, services and supply chains.
o Operations Research (OR) tends to focus on the analysis of mathematical models.
o Operations Management (OM) is closer to industrial engineering and business
administration.
• Business process management is the discipline that combines approaches
for the design, execution, control, measurement and optimization of
business processes.

Process Mining: Data Science in Action, 2nd Edition


o Business Process Management (BPM) efforts tend to put emphasis on explicit process
models (e.g., Petri nets or BPMN models) that describe the control-flow and, optionally,
other perspectives (organization, resources, data, functions, etc.).

27
Bridging the Gap between
Process Science and Data Science
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Process mining is also part of process science.


o Process mining techniques can be used to discover process models from event data.
o By replaying these data, bottlenecks and the effects of non-compliance can be unveiled.
o Compared to mainstream BPM approaches the focus is not on process modeling, but on
exploiting event data.
o Sometimes the terms Workflow Mining (WM), Business Process Intelligence (BPI), and
Automated Business Process Discovery (ABPD) are used to refer to process-centric
data-driven approaches.

Process Mining: Data Science in Action, 2nd Edition


• Businessprocess improvement is an umbrella term for a variety of
approaches aiming at process improvement.
o Examples are Total Quality Management (TQM), Kaizen, (Lean) Six Sigma, Theory of
Constraints (TOC), and Business Process Reengineering (BPR).
o Note that most of the ingredients ultimately aim at process improvement, thus making
the term business process improvement rather unspecific.
o One could argue that the whole of process science aims to improve processes. 28
Bridging the Gap between
Process Science and Data Science
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• Process automation & workflow management focuses on the development of


information systems supporting operational business processes including the
routing and distribution of work.
o Workflow Management (WFM) systems are model-driven, i.e., a process model suffices to
configure the information system and run the process.
o As a result, a process can be changed by modifying the corresponding process model.
• Formal methods & concurrency theory build on the foundations of theoretical
computer science, in particular logic calculi, formal languages, automata theory,
and program semantics.
o Formal methods use a range of languages to describe processes.
o Examples are transition systems, Petri nets, process calculi such as CSP, CCS and π-calculus,

Process Mining: Data Science in Action, 2nd Edition


temporal logics such as LTL and CTL, and statecharts.
o Model checkers such as SPIN can be used to verify logical properties such as the absence of
deadlocks.
o Concurrency complicates analysis, but is also essential: In reality parts of a process or system
may be executing simultaneously and potentially interacting with each other.
o Petri nets were the first formalism to model and analyze concurrent processes.
o Many BPM, WFM, and process mining approaches build upon such formalisms.
29
Process mining use cases
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

• What is the process that people really follow?


• Where are the bottlenecks in my process?
• Where do people (or machines) deviate from the
expected or idealized process?
• What are the "highways" in my process?
• What factors are influencing a bottleneck?
• Can we predict problems (delay, deviation, risk, etc.)

Coursera course: Process Mining: Data Science in Action


for running cases?

Process Mining: Data Science in Action, 2nd Edition


• Can we recommend countermeasures?
• How to redesign the process / organization /
machine?
• …

30
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

Synonyms?

Data Mining
Machine Learning
Process Mining

31

Process Mining: Data Science in Action.


Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

End of lecture

chapter 2
Prepare and surf

32

Process Mining: Data Science in Action, 2nd Edition


Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt

i.helal@fci-cu.edu.eg
Any Questions?

http://scholar.cu.edu.eg/?q=imanhelal/
Thank you ☺
Course Code: IS423
Course Name: Business Process Mining

33

Process Mining: Data Science in Action, 2nd Edition

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy