Course Code: IS423 Course Name: Business Process Mining: Presented By: Dr. Iman Helal
Course Code: IS423 Course Name: Business Process Mining: Presented By: Dr. Iman Helal
Presented by:
Dr. Iman Helal
Before lecture
• Free
• Book:
Book Link
• Access
• Course
• Course
ID
code
Access to Blackboard
124487
212201.FCI.IS423
Name Business Process Mining
5
Chapter 1:
Data Science in Action
Data is GOLD
Data is the new OIL
• Note that here we used the International System of Units (SI) set of unit prefixes, also known as SI
prefixes, rather than binary prefixes. If we assume binary prefixes, then 1 Kilobyte is 210 = 1024
bytes, 1 Megabyte is 220 = 1048576 bytes, and 1 Zettabyte is 270 ≈ 1.18×1021 bytes. 8
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt
Moore’s law
Internet of Events
10
12
14
"Big" to be challenging!
Data does not have to be
15
• Data science is an interdisciplinary field aiming to turn data into real value.
• Data may be structured or unstructured, big or small, static or streaming.
• Value may be provided in the form of predictions, automated decisions,
models learned from data, or any type of data visualization delivering
insights.
• Data science includes data extraction, data preparation, data exploration,
16
Data Scientist
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt
17
Data Scientist
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt
18
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt
Ingredients
Data Science
Process
Business
Intelligence
19
• Statistics is the origin of data science. The discipline is typically split into
o descriptive statistics (to summarize sample data using notions like mean, standard deviation,
and frequency) and
o inferential statistics (using sample data to estimate characteristics of all data or to test a
hypothesis).
• Algorithms are crucial in any approach analyzing data.
o When data sets get larger, the complexity of the algorithms becomes a primary concern.
o For example, the Apriori algorithm for finding frequent items sets, the MapReduce approach
for parallelizing algorithms, and the PageRank algorithm used by Google search.
• Predictive analytics is the practice of extracting information from existing data sets in order to
determine patterns and predict future outcomes and trends.
o To generate predictions, existing mining and learning approaches are applied in a business context.
o Predictive analytics is related to business analytics and business intelligence.
• Databases are used to store data.
o It is one of the cornerstones of data science.
o DBMSs serve two purposes:
▪ (i) structuring data so that they can be managed easily and
▪ (ii) providing scalability and reliable performance.
o Using database technology, application programmers do not need to worry about data storage.
o Until recently, relational databases and SQL (Structured Query Language) were the norm.
o Due to the growing volume of data, massively distributed databases and so-called NoSQL databases emerged.
o Moreover, in-memory computing (cf. SAP HANA) can be used to answer questions in real-time.
• Behavioral/social science
o most data are (indirectly) generated by people and analysis results are often used to influence people (e.g., guiding the
customer to a product or encouraging a manager to eliminate waste).
o It is the systematic analysis and investigation of human behavior.
o Social sciences study the processes of a social system and the relationships among individuals within a society.
o To interpret the results of various types of analytics, it is important to understand human behavior and the social
context in which humans and organizations operate.
o Moreover, analysis results often trigger questions related to coaching and positively influencing people.
• Privacy, security, law, and ethics are key ingredients to protect individuals and organizations from “bad”
data science practices.
o Privacy refers to the ability to seclude sensitive information.
▪ Privacy often depends on security mechanisms which aim to ensure the confidentiality, integrity and availability of
data.
25
• Operations
management & research deals with the design, control and
management of products, processes, services and supply chains.
o Operations Research (OR) tends to focus on the analysis of mathematical models.
o Operations Management (OM) is closer to industrial engineering and business
administration.
• Business process management is the discipline that combines approaches
for the design, execution, control, measurement and optimization of
business processes.
27
Bridging the Gap between
Process Science and Data Science
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt
30
Information S y stems D epartment , F aculty of C omputers and A rtificial Intelligfence, C airo U niv ersity , G iza, 12613, E gy pt
Synonyms?
Data Mining
Machine Learning
Process Mining
31
End of lecture
chapter 2
Prepare and surf
32
i.helal@fci-cu.edu.eg
Any Questions?
http://scholar.cu.edu.eg/?q=imanhelal/
Thank you ☺
Course Code: IS423
Course Name: Business Process Mining
33