0% found this document useful (0 votes)

8 views18 pages

Chapter - 5 - Data Mining

Chapter 5 discusses Data Mining, defined as the process of discovering patterns and insights from large datasets using various techniques. It covers the steps in the data mining process, key techniques, and the differences between Knowledge Discovery in Databases (KDD) and data mining, as well as the advantages and challenges associated with data mining. Applications of data mining span across multiple industries, aiding in decision-making and predictive analytics.

Uploaded by

whatamagnificentpurpose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views18 pages

Chapter - 5 - Data Mining

Uploaded by

whatamagnificentpurpose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Chapter 5

Data Mining

Amol D. Vibhute (PhD)

Assistant Professor

Email ID:- amol.vibhute@sicsr.ac.in

Roadmap of Chapter:
• Introduction
• What is data mining?
• KDD vs data mining,
• Information extraction,
• Data mining characteristics,
• Issues and challenges in DM,
• Application of DM

Tuesday, March 4, 2025 Dr. Amol 2

Introduction:
• Data Mining is the process of discovering patterns, trends, and insights from large datasets using techniques
from machine learning, statistics, and database systems. It helps in making data-driven decisions in
business, healthcare, finance, and many other fields.
• Key Features of Data Mining
– Extracts hidden patterns from raw data.
– Uses algorithms to find relationships in large datasets.
– Supports decision-making in various industries.
– Improves efficiency in predictive analytics & business intelligence (BI).

Tuesday, March 4, 2025 Dr. Amol 3

Cont.…
• Steps in Data Mining Process
• Data Collection & Preprocessing
– Gathering data from multiple sources (databases, IoT devices, logs).
– Cleaning and transforming data (handling missing values, normalization).
• Data Exploration & Transformation
– Identifying key attributes (feature selection).
– Removing noise & duplicates for accurate results.
• Applying Data Mining Techniques
– Classification – Predicting categories (Spam/Not Spam).
– Clustering – Grouping similar data (Customer Segmentation).
– Association Rule Mining – Finding patterns (Market Basket Analysis).
– Regression – Predicting continuous values (Stock Prices).
• Pattern Evaluation & Interpretation
– Extracting meaningful insights from discovered patterns.
• Deployment & Decision Making
– Using insights for fraud detection, customer analytics, healthcare, finance, etc.

Tuesday, March 4, 2025 Dr. Amol 4

Cont.…
• Key Data Mining Techniques
• Classification (Supervised Learning)
– Example: Email spam detection (Spam or Not Spam).
– Algorithms: Decision Trees, Naïve Bayes, Random Forest, SVM.
• Clustering (Unsupervised Learning)
– Example: Grouping customers by shopping behavior.
– Algorithms: K-Means, DBSCAN, Hierarchical Clustering.
• Association Rule Mining
– Example: "People who buy milk also buy bread" (Market Basket Analysis).
– Algorithm: Apriori, FP-Growth.
• Anomaly Detection
– Example: Fraud detection in credit card transactions.
– Techniques: Isolation Forest, One-Class SVM.
• Regression Analysis
– Example: Predicting house prices based on location, size, and amenities.
– Algorithms: Linear Regression, Decision Trees, Neural Networks.

Tuesday, March 4, 2025 Dr. Amol 5

KDD:
• KDD (Knowledge Discovery in Databases) is a process that involves the extraction of
useful, previously unknown, and potentially valuable information from large datasets.
• The KDD process is an iterative process and it requires multiple iterations of the
above steps to extract accurate knowledge from the data.
– Data Cleaning
• Data cleaning is defined as removal of noisy and irrelevant data from
collection.
• Cleaning in case of Missing values.
• Cleaning noisy data, where noise is a random or variance error.
• Cleaning with Data discrepancy detection and Data transformation tools.
– Data Integration
• Data integration is defined as heterogeneous data from multiple sources
combined in a common source(DataWarehouse). Data integration using Data
Migration tools, Data Synchronization tools and ETL(Extract-Load-
Transformation) process.

Tuesday, March 4, 2025 Dr. Amol 6

Cont.…
• Data Selection
– Data selection is defined as the process where data relevant to the analysis is decided and retrieved from the data collectio n. For this we can use
Neural network, Decision Trees, Naive bayes, Clustering, and Regression methods.

• Data Transformation
– Data Transformation is defined as the process of transforming data into appropriate form required by mining procedure. Data Transformation is a two
step process:

• Data Mapping: Assigning elements from source base to destination to capture transformations.
• Code generation: Creation of the actual transformation program.
• Data Mining
– Data mining is defined as techniques that are applied to extract patterns potentially useful. It transforms task relevant dat a into patterns, and decides
purpose of model using classification or characterization.

• Pattern Evaluation
– Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge based on given measures. It find interestingness score
of each pattern, and uses summarization and Visualization to make data understandable by user.

• Knowledge Representation
– This involves presenting the results in a way that is meaningful and can be used to make decisions.

Tuesday, March 4, 2025 Dr. Amol 7

Cont.…
• Advantages of KDD
– Improves decision-making: KDD provides valuable insights and knowledge that can help organizations make better
decisions.
– Increased efficiency: KDD automates repetitive and time-consuming tasks and makes the data ready for analysis,
which saves time and money.
– Better customer service: KDD helps organizations gain a better understanding of their customers’ needs and
preferences, which can help them provide better customer service.
– Fraud detection: KDD can be used to detect fraudulent activities by identifying patterns and anomalies in the data
that may indicate fraud.
– Predictive modeling: KDD can be used to build predictive models that can forecast future trends and patterns.

Tuesday, March 4, 2025 Dr. Amol 8

Cont.…
• Disadvantages of KDD
– Privacy concerns: KDD can raise privacy concerns as it involves collecting and analyzing large amounts of data,
which can include sensitive information about individuals.
– Complexity: KDD can be a complex process that requires specialized skills and knowledge to implement and
interpret the results.
– Unintended consequences: KDD can lead to unintended consequences, such as bias or discrimination, if the data or
models are not properly understood or used.
– Data Quality: KDD process heavily depends on the quality of data, if data is not accurate or consistent, the results
can be misleading
– High cost: KDD can be an expensive process, requiring significant investments in hardware, software, and
personnel.
– Overfitting: KDD process can lead to overfitting, which is a common problem in machine learning where a model
learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model
on new unseen data.
Tuesday, March 4, 2025 Dr. Amol 9
Difference between KDD and Data Mining:
Parameter KDD Data Mining

KDD refers to a process of identifying valid, novel, potentially useful, and ultimately Data Mining refers to a process of extracting useful and
Definition
understandable patterns and relationships in data. valuable information or patterns from large data sets.

Objective To find useful knowledge from data. To extract useful information from data.

Association rules, classification, clustering, regression,

Data cleaning, data integration, data selection, data transformation, data mining, pattern
Techniques Used decision trees, neural networks, and dimensionality
evaluation, and knowledge representation and visualization.
reduction.

Structured information, such as rules and models, that can be used to make decisions or Patterns, associations, or insights that can be used to
Output
predictions. improve decision-making or understanding.

Data mining focus is on the discovery of patterns or

Focus Focus is on the discovery of useful knowledge, rather than simply finding patterns in data.
relationships in data.

Domain expertise is less critical in data mining, as the

Domain expertise is important in KDD, as it helps in defining the goals of the process,
Role of domain expertise algorithms are designed to identify patterns without relying
choosing appropriate data, and interpreting the results.
on prior knowledge.

Tuesday, March 4, 2025 Dr. Amol 10

Data mining characteristics:
• Data mining service is an easy form of information gathering methodology wherein which all the relevant information
goes through some sort of identification process.
• And eventually at the end of this process, one can determine all the characteristics of the data mining process.
• 1. Increased quantities of data:
– In earlier days, the data mining system can be determined with the help of their clients and customers, but in today’s date, one can acquire any
number of information without the help of those clients.
– Moreover, after this kind of revolution in the mining system, it also added one more problem and that is large quantities of work.
– With the help of this information technology, one can acquire a large number of information without any extra burden or troub le.

• 2. Provides incomplete data:

– Most of the people provide incomplete information about themselves in some of the survey conducted with the help of data mini ng systems.
– Therefore, people ignore the value of their information and that is why they provide incomplete information about themselves in those surveys
conducted for the benefit of the mining systems.
– Moreover, these mining systems changed the perspective of people and because of that, people fear the exchange of their perso nal information.

• 3. Complicated data structure:

– Data mining is a form wherein which all the information is gathered and incorporated with the help of information collection techniques. These information collecting techniques are more of
manual and rest are technological. Therefore, most of the understanding and determination of these mining can be a bit compli cated than other structures of information technology.

Tuesday, March 4, 2025 Dr. Amol 11

Issues and challenges in DM:
• Data Mining Issues:
– 1. Mining methodology and user interaction issues:
• i. Mining different kinds of knowledge in databases:
– Different user - different knowledge - different way. That means different client want a different kind of information so it becomes difficult to cover vast range of
data that can meet the client requirement.

• ii. Interactive mining of knowledge at multiple levels of abstraction:

– Interactive mining allows users to focus the search for patterns from different angles. The data mining process should be int eractive because it is difficult to
know what can be discovered within a database.

• iii. Incorporation of background knowledge:

– Background knowledge is used to guide discovery process and to express the discovered patterns.

• iv. Query languages and ad hoc mining:

– Relational query languages (such as SQL) allow users to pose ad-hoc queries for data retrieval. The language of data mining query language should be in
perfectly matched with the query language of data warehouse.

• v. Handling noisy or incomplete data:

– In a large database, many of the attribute values will be incorrect. This may be due to human error or because of any instrum ents fail. Data cleaning methods
and data analysis methods are used to handle noise data.

Tuesday, March 4, 2025 Dr. Amol 12

Cont.…
• 2. Performance issues:
– i. Efficiency and scalability of data mining algorithms:
• To effectively extract information from a huge amount of data in databases, data mining algorithms must be efficient and scal able.

– ii. Parallel, distributed, and incremental mining algorithms:

• The huge size of many databases, the wide distribution of data, and complexity of some data mining methods are factors motiva ting the
development of parallel and distributed data mining algorithms. Such algorithms divide the data into partitions, which are pr ocessed in parallel.

• 3. Issues relating to the diversity of database types:

– i. Handling of relational and complex types of data:
• There are many kinds of data stored in databases and data warehouses. It is not possible for one system to mine all these kin d of data. So
different data mining system should be construed for different kinds data.

– ii. Mining information from heterogeneous databases and global information systems:
• Since data is fetched from different data sources on Local Area Network (LAN) and Wide Area Network (WAN).The discovery of kn owledge from
different sources of structured is a great challenge to data mining.

Tuesday, March 4, 2025 Dr. Amol 13

Cont.…
• Major Challenges In Data Mining:
1. Security and Social Challenges:
• Dynamic techniques are done through data assortment sharing, so it requires impressive security. Private information about
people and touchy information is gathered for the client’s profiles, client standard of conduct understanding —illicit admittance
to information and the secret idea of information turning into a significant issue.

2. Noisy and Incomplete Data:

• Data Mining is the way toward obtaining information from huge volumes of data. This present reality information is noisy,
incomplete, and heterogeneous. Data in huge amounts regularly will be unreliable or inaccurate. These issues could be
because of human mistakes blunders or errors in the instruments that measure the data.

3. Distributed Data:
• True data is normally put away on various stages in distributed processing conditions. It very well may be on the internet,
individual systems, or even on the databases. It is essentially hard to carry all the data to a unified data archive principa lly
because of technical and organizational reasons.

Tuesday, March 4, 2025 Dr. Amol 14

Cont.…
• Major Challenges In Data Mining:
4. Complex Data:
• True data is truly heterogeneous, and it very well may be media data, including natural language text, time series, spatial d ata,
temporal data, complex data, audio or video, images, etc. It is truly hard to deal with these various types of data and
concentrate on the necessary information. More often than not, new apparatuses and systems would need to be created to
separate important information.

5. Performance:
• The presentation of the data mining framework basically relies upon the productivity of techniques and algorithms utilized. O n
the off chance that the techniques and algorithms planned are not sufficient; at that point, it will influence the presentati on of
the data mining measure unfavorably.

6. Scalability and Efficiency of the Algorithms:

• The Data Mining algorithm should be scalable and efficient to extricate information from tremendous measures of data in the
data set.

Tuesday, March 4, 2025 Dr. Amol 15

Cont.…
• Major Challenges In Data Mining:
7. Improvement of Mining Algorithms:
• Factors, for example, the difficulty of data mining approaches, the enormous size of the database, and the entire data flow
inspire the distribution and creation of parallel data mining algorithms.

8. Incorporation of Background Knowledge:

• In the event that background knowledge can be consolidated, more accurate and reliable data mining arrangements can be
found. Predictive tasks can make more accurate predictions, while descriptive tasks can come up with more useful findings.
Be that as it may, gathering and including foundation knowledge is an unpredictable cycle.

9. Ethics:
• Data mining raises ethical concerns related to the collection, use, and dissemination of data. The data may be used to
discriminate against certain groups, violate privacy rights, or perpetuate existing biases. Moreover, data mining algorithms
may not be transparent, making it challenging to detect biases or discrimination.

Tuesday, March 4, 2025 Dr. Amol 16

Application of DM:

Tuesday, March 4, 2025 Dr. Amol 17

Thank You !!!

Tuesday, March 4, 2025 Dr. Amol 18

Chapter 1 - The Business and Society Relationship
100% (5)
Chapter 1 - The Business and Society Relationship
19 pages
Lesson Plan Resources
No ratings yet
Lesson Plan Resources
8 pages
Lesson Plan For "Boom" and Setting
No ratings yet
Lesson Plan For "Boom" and Setting
3 pages
Unit 1 DM
No ratings yet
Unit 1 DM
16 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
52 pages
Unit - I MLT
No ratings yet
Unit - I MLT
137 pages
New Note
No ratings yet
New Note
23 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
DM Module1
No ratings yet
DM Module1
15 pages
DWM 4
No ratings yet
DWM 4
23 pages
KDD
No ratings yet
KDD
3 pages
Module-1 DM
No ratings yet
Module-1 DM
15 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
Data Mining
No ratings yet
Data Mining
17 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
Data Mining and KDD
No ratings yet
Data Mining and KDD
15 pages
Lesson 1
No ratings yet
Lesson 1
32 pages
Chapter 1 - Data Mining and Data Warehouse
No ratings yet
Chapter 1 - Data Mining and Data Warehouse
44 pages
BDUD Unit1
No ratings yet
BDUD Unit1
100 pages
Assignment Solution
No ratings yet
Assignment Solution
27 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
p144 Data Mining
100% (3)
p144 Data Mining
11 pages
60 Common Data Mining Interview Questions in 2025
No ratings yet
60 Common Data Mining Interview Questions in 2025
20 pages
FDS Unit01
No ratings yet
FDS Unit01
10 pages
Unit Iii
No ratings yet
Unit Iii
33 pages
Unit 1
No ratings yet
Unit 1
43 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
30 pages
Fund Data Science
No ratings yet
Fund Data Science
91 pages
Unit 1
No ratings yet
Unit 1
102 pages
Topic 3 - Data Mining
No ratings yet
Topic 3 - Data Mining
37 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
Data Mining Note
No ratings yet
Data Mining Note
79 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Lect 1 2 Data Mining 3
No ratings yet
Lect 1 2 Data Mining 3
19 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
50 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Data Mining 4545
No ratings yet
Data Mining 4545
20 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 12 Introduction To Data Mining by Samaher Hussein Ali
4 pages
Chapter-3 DATA MINING PDF
No ratings yet
Chapter-3 DATA MINING PDF
13 pages
IS352 - Lecture 01
No ratings yet
IS352 - Lecture 01
62 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
Past PPR
No ratings yet
Past PPR
31 pages
Data Mining Chapter 1
0% (1)
Data Mining Chapter 1
12 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
13 pages
Data Mining
No ratings yet
Data Mining
46 pages
Data Mining and Its Branches
No ratings yet
Data Mining and Its Branches
37 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Data Mining Overview
No ratings yet
Data Mining Overview
14 pages
PPT-DWDM Unit 3
No ratings yet
PPT-DWDM Unit 3
106 pages
Data Mining:: Knowledge Discovery in Databases
No ratings yet
Data Mining:: Knowledge Discovery in Databases
14 pages
Haramaya University College of Engineering and Technology Department of Information Technology
No ratings yet
Haramaya University College of Engineering and Technology Department of Information Technology
38 pages
Unit 1
No ratings yet
Unit 1
19 pages
Types of Attributes-1
No ratings yet
Types of Attributes-1
8 pages
Data Preprocessing Personal
No ratings yet
Data Preprocessing Personal
11 pages
Data Mining
No ratings yet
Data Mining
88 pages
Data Mining
No ratings yet
Data Mining
20 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Comprehensive Guide in Writing RRL
No ratings yet
Comprehensive Guide in Writing RRL
3 pages
Learning Activity 1.3: Describing A Process
No ratings yet
Learning Activity 1.3: Describing A Process
3 pages
62062905
No ratings yet
62062905
16 pages
Critical Systems Thinking WitsDigitalCampus v.23.2
No ratings yet
Critical Systems Thinking WitsDigitalCampus v.23.2
10 pages
Grade 1 Term 1 Life Skills Formal Assessment 2024
No ratings yet
Grade 1 Term 1 Life Skills Formal Assessment 2024
6 pages
Lesson Plans
No ratings yet
Lesson Plans
20 pages
Lesson2 2-Hatsofftothewumps
No ratings yet
Lesson2 2-Hatsofftothewumps
6 pages
Lab ANDandXOR REGRESSION ANN
No ratings yet
Lab ANDandXOR REGRESSION ANN
13 pages
PST 312 M 10-20
90% (10)
PST 312 M 10-20
22 pages
How New Pedagogies Find Deep Learning
No ratings yet
How New Pedagogies Find Deep Learning
6 pages
Cybersafety Project Classroom Poster
No ratings yet
Cybersafety Project Classroom Poster
3 pages
Teaching and Teacher Education: Jo Westbrook, Alison Croft
No ratings yet
Teaching and Teacher Education: Jo Westbrook, Alison Croft
9 pages
Forest Fire Prediction Sem 8 - Review 1
No ratings yet
Forest Fire Prediction Sem 8 - Review 1
33 pages
GED0104 STS Module 2 Facilitation Guide
No ratings yet
GED0104 STS Module 2 Facilitation Guide
8 pages
Effectiveness of Senior High School
No ratings yet
Effectiveness of Senior High School
4 pages
Samiksha Krishna Kadam
No ratings yet
Samiksha Krishna Kadam
6 pages
LP in English 8 Cot 1
No ratings yet
LP in English 8 Cot 1
3 pages
Chapter 1 - Ojt Narrative REPORT
0% (1)
Chapter 1 - Ojt Narrative REPORT
3 pages
Emphasis of Social Studies
No ratings yet
Emphasis of Social Studies
12 pages
Emotion Recognition From Formal Text (Poetry)
No ratings yet
Emotion Recognition From Formal Text (Poetry)
3 pages
Ashley Paschal Resume
No ratings yet
Ashley Paschal Resume
2 pages
Conditional Probability
No ratings yet
Conditional Probability
8 pages
Real Life Bully Prevention For Real Kids
No ratings yet
Real Life Bully Prevention For Real Kids
207 pages
Hope 3 Week 5
No ratings yet
Hope 3 Week 5
3 pages
Lessonplan 3
No ratings yet
Lessonplan 3
1 page
English Grade 7 Q1 LP 8
No ratings yet
English Grade 7 Q1 LP 8
8 pages
Complete Download The Feldenkrais Method in Creative Practice: Dance, Music and Theatre Robert Sholl PDF All Chapters
100% (7)
Complete Download The Feldenkrais Method in Creative Practice: Dance, Music and Theatre Robert Sholl PDF All Chapters
55 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter - 5 - Data Mining

Uploaded by

Chapter - 5 - Data Mining

Uploaded by

Chapter 5

Amol D. Vibhute (PhD)

Email ID:- amol.vibhute@sicsr.ac.in

Tuesday, March 4, 2025 Dr. Amol 2

Tuesday, March 4, 2025 Dr. Amol 3

Tuesday, March 4, 2025 Dr. Amol 4

Tuesday, March 4, 2025 Dr. Amol 5

Tuesday, March 4, 2025 Dr. Amol 6

Tuesday, March 4, 2025 Dr. Amol 7

Tuesday, March 4, 2025 Dr. Amol 8

Association rules, classification, clustering, regression,

Data mining focus is on the discovery of patterns or

Domain expertise is less critical in data mining, as the

Tuesday, March 4, 2025 Dr. Amol 10

• 2. Provides incomplete data:

• 3. Complicated data structure:

Tuesday, March 4, 2025 Dr. Amol 11

• ii. Interactive mining of knowledge at multiple levels of abstraction:

• iii. Incorporation of background knowledge:

• iv. Query languages and ad hoc mining:

• v. Handling noisy or incomplete data:

Tuesday, March 4, 2025 Dr. Amol 12

– ii. Parallel, distributed, and incremental mining algorithms:

• 3. Issues relating to the diversity of database types:

Tuesday, March 4, 2025 Dr. Amol 13

2. Noisy and Incomplete Data:

Tuesday, March 4, 2025 Dr. Amol 14

6. Scalability and Efficiency of the Algorithms:

Tuesday, March 4, 2025 Dr. Amol 15

8. Incorporation of Background Knowledge:

Tuesday, March 4, 2025 Dr. Amol 16

Tuesday, March 4, 2025 Dr. Amol 17

Tuesday, March 4, 2025 Dr. Amol 18

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.