0% found this document useful (0 votes)

12 views26 pages

Data Mining L1,2

Data mining is the process of extracting knowledge from large datasets, utilizing various tools and techniques for data analysis. It plays a crucial role in sectors like marketing, banking, and cybersecurity, helping organizations derive valuable insights and improve decision-making. The document outlines the KDD process, advantages, use cases, and functionalities of data mining, while also addressing its challenges and differences from machine learning.

Uploaded by

xataje8102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views26 pages

Data Mining L1,2

Uploaded by

xataje8102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Rishi Sharma

IIIT Surat
Data Mining
Extracting data from a large database is data mining. Data Mining is defined as the extraction of
data from enormous data sets.

Data mining is the process of mining knowledge. To recognize meaningful patterns, the data
mining process relies on data compiled in the data warehousing stage.

For instance - "Gold Mining from rock or sand" is the same as "Data Mining Knowledge"

Data Mining tools perform data analysis and may uncover important
data patterns, contributing greatly to business strategies, knowledge
bases, and scientific and medical research.
Data Mining Software & Tools
What Motivated Data Mining?
The database system industry has witnessed an evolutionary path in the
development of the following functionalities data collection and database creation,
data management and advanced data analysis.
Importance of Data Mining
❖ Data mining is a growing industry. Many vendors, such as AWS, Oracle, Microsoft,
SAP, and SAS Institute, provide tools used for data mining.
❖ Data mining ensures that useful information can be derived from raw data and used to
benefit both the organization and its customers.
❖ Data mining helps are detecting fraud, spam filtering, managing risks, and
cybersecurity.
❖ Marketing sector, it helps in forecasting customer behavior.
❖ Banking sector, it can help in determining fraudulent transactions.
Why it is important?

The most common applications for the use of data mining areas -
❖ Market Analysis
❖ Detection of fraud
❖ Customer retention
❖ Control of Production
❖ Scientific exploration
Database Evaluation
Evolution of Database Technology

❖ 1960s: Data collection, database creation, IMS and network DBMS

❖ 1970s: Relational data model, relational DBMS implementation

❖ 1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.)

and application-oriented DBMS (spatial, scientific, engineering, etc.)

❖ 1990s-2000s: Data mining and data warehousing, multimedia databases, and

Web databases
Origin of Data Mining

➔ Draws idea from machine learning/AI, pattern

recognition, statistics, and database systems.
➔ Traditional techniques may be unsuitable due
to:
◆ Enormity of data
◆ High dimensionality of data
◆ Heterogeneous,distributed nature of data
What is Data Mining???
Data mining refers to extracting or “mining” knowledge from large amounts of data.
Also referred as Knowledge Discovery in Databases.

It is a process of discovering interesting knowledge from large amounts of data

stored either in databases, data warehouses, or other information repositories.
KDD Process in Data Mining

The term Knowledge Discovery in Databases, or KDD, refers to the broad

process of discovering knowledge in data and emphasizes the "high-level"
application of specific data mining methods.

Researchers in machine learning, pattern recognition, databases, statistics,

artificial intelligence, expert systems knowledge acquisition, and data visualization
are of interest.
Data Mining: KDD Process

Figure: KDD process

Steps of KDD Process
❖ Data Selection - Data relevant to the retrieved analysis
❖ Data cleaning and pre-processing - Eliminate noisy and inconsistent information
❖ Data integration - Multiple data sources combined
❖ Data Transformation - Transform into a form suitable for data mining
❖ Data Mining - Extract data patterns using smart methods
❖ Evaluation of Pattern - Identify interesting patterns
❖ Knowledge representation - Representation of Knowledge, Presenting to the user of mined
knowledge
Data mining and Business Intelligence
Architecture of a typical data mining system
Data Mining Primitives
Data mining primitives define a data mining task, which can be specified in the
form of a data mining query.
❖ Task Relevant Data
❖ Kinds of knowledge to be mined
❖ Background knowledge
❖ Interestingness measure
❖ Presentation and visualization of discovered patterns
Data Mining: Confluence of multiple disciplines
Advantages of Data Mining

❖ Optimal product/service pricing: Using data mining to analyze the interplay of pricing variables, such as
demand, elasticity, distribution and brand perception, can help a business set prices that maximize profit.
❖ Better marketing: Data mining can help a company get more value out of their marketing campaigns by
segmenting customers with different behaviors, optimizing engagement by segment or providing insight to
aid development of personalized ad creative. The results of ad campaigns can often be demonstrated in
sales dashboards.
❖ Heightened employee productivity: Analyzing employee behavior patterns and viewing KPIs in HR
dashboards can lead to strategies for boosting employee engagement and productivity.
❖ Improved customer retention: Understanding customer behavior can improve customer relations, reducing
churn.
❖ Increased cost efficiency: Manufacturing costs, for example, could be lowered through many different
data mining analyses, from insights into supplier pricing behavior to better understanding customer buying
patterns.
❖ Higher product/service quality: Finding and fixing areas where quality falters can decrease product
returns.
Data Mining Use Cases

❖ Banking: Data mining is used to predict successful loan applicants as well as to detect fraud in credit
cards.
❖ Retail: Create effective advertisements based on past responses.
❖ Insurance: Predict probability and costs for future disasters, based on past hurricanes or tornadoes.
❖ Grocery stores: Analyze market baskets to find products usually bought together. Running a sales
promotion on one item can improve sales of the other item at its normal price.
❖ Manufacturing: Implement just-in-time fulfillment by predicting when new supplies should be ordered
or when equipment is likely to fail.
❖ Customer relationship management: Identify characteristics of customers who move to competitors,
then offer special deals to retain other customers with those same characteristics.
❖ Security: Intrusion detection techniques use data mining to identify anomalies that could be network
break-ins.
Data Mining Technology
❖ Classification: Assigns data to multiple categories or classes. For example, a loan applicant can be
assigned to a low, medium or high-risk category. Usually, the categories for the model are predefined
based on previous analysis of the data.
❖ Anomaly detection: A form of classification that uses machine learning to detect data that does not fit a
class. For example, anomaly detection is used to find fraudulent credit card charges.
❖ Clustering: Identifies groups of similar data. For example, clustering can be used to find customers with
similar buying habits.
❖ Association: Generates a probability of multiple events occurring together. One application is “market
basket analysis,” which discovers when two or more items are frequently bought together.
❖ Regression: Using a data set where values are known, regression techniques attempt to predict a value
based on multiple attributes. For example, regression could predict sales based on the advertising dollars,
month, website visits and other financial attributes.
❖ Neural networks: A form of artificial intelligence that mimics the human brain to find relationships in data.
Neural networks have multiple applications, for example, in predicting customer behavior.
Data Mining Process
❖ Define goal: Do you want to learn more about your customers? Do you want to cut manufacturing
costs? Do you want to increase revenue? Do you want to detect fraud? Clearly identify the desired
outcome of data mining implementation to get started.
❖ Gather the data: Data mining can answer all those questions, but each one requires a different set of
data. Often the data comes from multiple databases, for example, customers and orders.
❖ Cleanse the data: Once selected, the data usually needs to be cleansed, reformatted and validated.
❖ Get to know the data: Become familiar with the data by running basic statistical analyses and building
visual graphs and charts.
❖ Build a model: Model building is where the data mining process is most iterative. Analysts choose one
or more of the technology approaches discussed in the next section and apply one or more to the data
being mined.
❖ Validate the results: Whichever techniques are used, examine the results to validate that the findings
are accurate. If not, go back to step above — rebuild the model.
❖ Implement the model: Use the discoveries to fulfill your original business goal.
Drawbacks of Data Mining

❖ Data analytics tools are often complicated to use.

❖ It takes highly trained and skilled personnel to analyze data properly.
❖ It is also complicated to determine which tools should be used.
❖ There are many privacy concerns surrounding data mining.
❖ The information obtained through data mining may not be completely accurate.
Data Mining Vs. Machine Learning

❖ Data mining is analyzing datasets to find useful information. Machine learning

refers to discovering algorithms that have improved with the experience gained
from data.
❖ Data mining was discovered decades before machine learning. Machine
learning, in comparison, is a newer technology.
❖ Data mining works with large amounts of raw data. Machine learning uses
algorithm.
❖ Data mining can only work with human intervention. Experts must be involved
in the data mining process. Machine learning was created to function without
much human intervention. Its algorithms learn from experience and improve
themselves.
Types of data mining tasks

❖ Data mining tasks can be classified into two categories:

➢ Descriptive data mining: It highlights common characteristics without any historical or previous data
input. Examples are count and average.
➢ Predictive data mining: It can predict important business metrics using previously available
information based on the data’s linearity. For example quarter result
Data Mining Functionality

❖ Classification: This functionality categorises data into different classes using trained data
sets. It is commonly used in applications like spam filtering or customer segmentation.
❖ Clustering: Similar to classification but without predefined classes, clustering groups a set
of objects so that objects in the same group are more similar to each other than those in
other groups.
❖ Regression: This method predicts a range of numeric values based on a continuous
dataset, which helps predict sales figures and inventory requirements.
❖ Association Rules: This involves discovering interesting relations between variables in
large databases. For example, identifying products frequently bought together can help in
cross-selling strategies in retail.
❖ Anomaly Detection (Outlier Change Detection): This functionality identifies unusual data
records, which can be helpful in fraud detection by spotting unusual transactions.
Data Mining Functionality

❖ Sequential Patterns: This functionality identifies regular sequences or patterns in data where one
event leads to another. It is helpful in various applications, including web page analysis and
studying purchase patterns.
❖ Decision Trees: A decision tree is a model that uses a tree-like graph of decisions and their
possible consequences. It is used extensively in decision analysis to visually and explicitly
represent decisions and decision-making.
❖ Neural Networks: Inspired by the human brain, neural networks are a series of algorithms that
attempt to recognise underlying relationships in a data set through a process that mimics how the
human brain operates.
❖ Data Visualisation: Turning complex data sets into graphical representations that are easy to
understand and interpret. This functionality helps stakeholders make sense of complicated data
through visual storytelling.
❖ Text Mining: Utilising techniques to extract qualitative information from text data sources. This is
increasingly important as data comes in numbers and texts, requiring deep analytics.

Data Mining
No ratings yet
Data Mining
395 pages
Topic 3 Data Mining For Business Intelligence
No ratings yet
Topic 3 Data Mining For Business Intelligence
49 pages
Data Rich, Information Poor
No ratings yet
Data Rich, Information Poor
5 pages
09-Datamining Concepts
100% (1)
09-Datamining Concepts
121 pages
data_mining
No ratings yet
data_mining
22 pages
Data Mining L-5
No ratings yet
Data Mining L-5
19 pages
DM ITERA 2020 w1
No ratings yet
DM ITERA 2020 w1
35 pages
Data Mining Unit 1(Msc Ds 3 Sem)
No ratings yet
Data Mining Unit 1(Msc Ds 3 Sem)
119 pages
Technincal Report
No ratings yet
Technincal Report
10 pages
Lecture 6 Compress
No ratings yet
Lecture 6 Compress
9 pages
DataMiningFinal
No ratings yet
DataMiningFinal
38 pages
combinepdf-1
No ratings yet
combinepdf-1
74 pages
MUAZ
No ratings yet
MUAZ
21 pages
Data Mining Notes
No ratings yet
Data Mining Notes
46 pages
Presentation Data Mining
No ratings yet
Presentation Data Mining
22 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
Chapter 5- Data Mining
No ratings yet
Chapter 5- Data Mining
29 pages
Module 3
No ratings yet
Module 3
187 pages
Data Mining Tutorial - Javatpoint
No ratings yet
Data Mining Tutorial - Javatpoint
12 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
46 pages
Unit 3
No ratings yet
Unit 3
22 pages
Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science
No ratings yet
Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science
66 pages
Data-Mining-OVERVIEW (1)
No ratings yet
Data-Mining-OVERVIEW (1)
8 pages
1_Lect 1 & 2 Data Mining
No ratings yet
1_Lect 1 & 2 Data Mining
20 pages
Chapter-1 (Introduction)
No ratings yet
Chapter-1 (Introduction)
17 pages
DMI UNIT 1_186_N3
No ratings yet
DMI UNIT 1_186_N3
12 pages
Data Mining
No ratings yet
Data Mining
18 pages
Data Mining1
No ratings yet
Data Mining1
37 pages
Weaviate Advanced RAG Techniques eBook
100% (1)
Weaviate Advanced RAG Techniques eBook
13 pages
Datamining: by Guan Hang Su Cs157A Section 2 Fall 2005
0% (1)
Datamining: by Guan Hang Su Cs157A Section 2 Fall 2005
31 pages
Data Mining and Its Applications
No ratings yet
Data Mining and Its Applications
60 pages
Data Mining:: Dr. Hany Saleeb
No ratings yet
Data Mining:: Dr. Hany Saleeb
37 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
L_1 Data Mining
No ratings yet
L_1 Data Mining
17 pages
1.1 - Intro DM
No ratings yet
1.1 - Intro DM
4 pages
01 Unit1
No ratings yet
01 Unit1
13 pages
Data Mining Tutorial
No ratings yet
Data Mining Tutorial
30 pages
Data Mining and Data Warehousing Unit 3 Part 1
No ratings yet
Data Mining and Data Warehousing Unit 3 Part 1
13 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
KM Notes Unit-3
No ratings yet
KM Notes Unit-3
20 pages
Unit 1
No ratings yet
Unit 1
27 pages
Data Mining in Search Engine Analytics
No ratings yet
Data Mining in Search Engine Analytics
7 pages
Data Mining
No ratings yet
Data Mining
15 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
Data mining
No ratings yet
Data mining
8 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
DM Module1
No ratings yet
DM Module1
15 pages
SAP HANA Smart Data Integration: Transform and Replicate Data with Ease
No ratings yet
SAP HANA Smart Data Integration: Transform and Replicate Data with Ease
73 pages
Enabling Openvms For Data & Application Integration: November 30, 2005
No ratings yet
Enabling Openvms For Data & Application Integration: November 30, 2005
12 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Data Mining
No ratings yet
Data Mining
4 pages
SQL - Questions - Answers V3
No ratings yet
SQL - Questions - Answers V3
22 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Motivation of Data Mining
No ratings yet
Motivation of Data Mining
4 pages
DBMS Recovery Presentation
No ratings yet
DBMS Recovery Presentation
23 pages
Tutoriales JPA
No ratings yet
Tutoriales JPA
103 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
WDT01 Introduction
No ratings yet
WDT01 Introduction
89 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
Lecture 2. TRW & RM-2 (Zotero As Citation Manager)
No ratings yet
Lecture 2. TRW & RM-2 (Zotero As Citation Manager)
30 pages
Deadlock in DBMS: Rabeya Tus Sadia Lecturer, Dept. of CSE State University of Bangladesh
No ratings yet
Deadlock in DBMS: Rabeya Tus Sadia Lecturer, Dept. of CSE State University of Bangladesh
12 pages
Chapter-01
No ratings yet
Chapter-01
31 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Mthree Alumni Brochure - NA
No ratings yet
Mthree Alumni Brochure - NA
23 pages
Assignment#8 SQL
No ratings yet
Assignment#8 SQL
7 pages
Customer Name Item Shipping Address
No ratings yet
Customer Name Item Shipping Address
13 pages
Data Warehousing and Data Mining 3rd Class Second Course: Dr. Khalil I. Ghathwan
No ratings yet
Data Warehousing and Data Mining 3rd Class Second Course: Dr. Khalil I. Ghathwan
32 pages
Create A Duplicate ORACLE Database On Windows
No ratings yet
Create A Duplicate ORACLE Database On Windows
12 pages
Research Metrics Ver 1
No ratings yet
Research Metrics Ver 1
72 pages
Practical Questions XII 802
No ratings yet
Practical Questions XII 802
5 pages
Database Notes
No ratings yet
Database Notes
81 pages
Sun Directory Server Enterprise Edition 7.0 Release Notes: Part No: 820-4805 November 2009
No ratings yet
Sun Directory Server Enterprise Edition 7.0 Release Notes: Part No: 820-4805 November 2009
90 pages
134592 - Import of SAPDBA Role (Sapdba_role.sql)
No ratings yet
134592 - Import of SAPDBA Role (Sapdba_role.sql)
4 pages
EX NO 4
No ratings yet
EX NO 4
8 pages
OBIEE Review Checklist Version 1.0
No ratings yet
OBIEE Review Checklist Version 1.0
4 pages
Data Mining: by Doug Alexander
No ratings yet
Data Mining: by Doug Alexander
6 pages
PPT Presentation Unit -IIII
No ratings yet
PPT Presentation Unit -IIII
7 pages
Keip108 Pages 28 32 PDF
No ratings yet
Keip108 Pages 28 32 PDF
5 pages
Azure Data Factory
No ratings yet
Azure Data Factory
9 pages
Database Handbook RDC
No ratings yet
Database Handbook RDC
13 pages
Cst308 Comprehensive June 2023
No ratings yet
Cst308 Comprehensive June 2023
6 pages
21116
No ratings yet
21116
3 pages
Chapter 6 Management Information System
No ratings yet
Chapter 6 Management Information System
6 pages
Certified Hadoop and Spark Course Curriculum
No ratings yet
Certified Hadoop and Spark Course Curriculum
9 pages
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining L1,2

Uploaded by

Data Mining L1,2

Uploaded by

Rishi Sharma

❖ 1960s: Data collection, database creation, IMS and network DBMS

❖ 1970s: Relational data model, relational DBMS implementation

❖ 1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.)

and application-oriented DBMS (spatial, scientific, engineering, etc.)

❖ 1990s-2000s: Data mining and data warehousing, multimedia databases, and

➔ Draws idea from machine learning/AI, pattern

It is a process of discovering interesting knowledge from large amounts of data

The term Knowledge Discovery in Databases, or KDD, refers to the broad

Researchers in machine learning, pattern recognition, databases, statistics,

Figure: KDD process

❖ Data analytics tools are often complicated to use.

❖ Data mining is analyzing datasets to find useful information. Machine learning

❖ Data mining tasks can be classified into two categories:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.