0% found this document useful (0 votes)
12 views26 pages

Data Mining L1,2

Data mining is the process of extracting knowledge from large datasets, utilizing various tools and techniques for data analysis. It plays a crucial role in sectors like marketing, banking, and cybersecurity, helping organizations derive valuable insights and improve decision-making. The document outlines the KDD process, advantages, use cases, and functionalities of data mining, while also addressing its challenges and differences from machine learning.

Uploaded by

xataje8102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views26 pages

Data Mining L1,2

Data mining is the process of extracting knowledge from large datasets, utilizing various tools and techniques for data analysis. It plays a crucial role in sectors like marketing, banking, and cybersecurity, helping organizations derive valuable insights and improve decision-making. The document outlines the KDD process, advantages, use cases, and functionalities of data mining, while also addressing its challenges and differences from machine learning.

Uploaded by

xataje8102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Rishi Sharma

IIIT Surat
Data Mining
Extracting data from a large database is data mining. Data Mining is defined as the extraction of
data from enormous data sets.

Data mining is the process of mining knowledge. To recognize meaningful patterns, the data
mining process relies on data compiled in the data warehousing stage.

For instance - "Gold Mining from rock or sand" is the same as "Data Mining Knowledge"

Data Mining tools perform data analysis and may uncover important
data patterns, contributing greatly to business strategies, knowledge
bases, and scientific and medical research.
Data Mining Software & Tools
What Motivated Data Mining?
The database system industry has witnessed an evolutionary path in the
development of the following functionalities data collection and database creation,
data management and advanced data analysis.
Importance of Data Mining
❖ Data mining is a growing industry. Many vendors, such as AWS, Oracle, Microsoft,
SAP, and SAS Institute, provide tools used for data mining.
❖ Data mining ensures that useful information can be derived from raw data and used to
benefit both the organization and its customers.
❖ Data mining helps are detecting fraud, spam filtering, managing risks, and
cybersecurity.
❖ Marketing sector, it helps in forecasting customer behavior.
❖ Banking sector, it can help in determining fraudulent transactions.
Why it is important?

The most common applications for the use of data mining areas -
❖ Market Analysis
❖ Detection of fraud
❖ Customer retention
❖ Control of Production
❖ Scientific exploration
Database Evaluation
Evolution of Database Technology

❖ 1960s: Data collection, database creation, IMS and network DBMS

❖ 1970s: Relational data model, relational DBMS implementation

❖ 1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.)

and application-oriented DBMS (spatial, scientific, engineering, etc.)

❖ 1990s-2000s: Data mining and data warehousing, multimedia databases, and

Web databases
Origin of Data Mining

➔ Draws idea from machine learning/AI, pattern


recognition, statistics, and database systems.
➔ Traditional techniques may be unsuitable due
to:
◆ Enormity of data
◆ High dimensionality of data
◆ Heterogeneous,distributed nature of data
What is Data Mining???
Data mining refers to extracting or “mining” knowledge from large amounts of data.
Also referred as Knowledge Discovery in Databases.

It is a process of discovering interesting knowledge from large amounts of data


stored either in databases, data warehouses, or other information repositories.
KDD Process in Data Mining

The term Knowledge Discovery in Databases, or KDD, refers to the broad


process of discovering knowledge in data and emphasizes the "high-level"
application of specific data mining methods.

Researchers in machine learning, pattern recognition, databases, statistics,


artificial intelligence, expert systems knowledge acquisition, and data visualization
are of interest.
Data Mining: KDD Process

Figure: KDD process


Steps of KDD Process
❖ Data Selection - Data relevant to the retrieved analysis
❖ Data cleaning and pre-processing - Eliminate noisy and inconsistent information
❖ Data integration - Multiple data sources combined
❖ Data Transformation - Transform into a form suitable for data mining
❖ Data Mining - Extract data patterns using smart methods
❖ Evaluation of Pattern - Identify interesting patterns
❖ Knowledge representation - Representation of Knowledge, Presenting to the user of mined
knowledge
Data mining and Business Intelligence
Architecture of a typical data mining system
Data Mining Primitives
Data mining primitives define a data mining task, which can be specified in the
form of a data mining query.
❖ Task Relevant Data
❖ Kinds of knowledge to be mined
❖ Background knowledge
❖ Interestingness measure
❖ Presentation and visualization of discovered patterns
Data Mining: Confluence of multiple disciplines
Advantages of Data Mining

❖ Optimal product/service pricing: Using data mining to analyze the interplay of pricing variables, such as
demand, elasticity, distribution and brand perception, can help a business set prices that maximize profit.
❖ Better marketing: Data mining can help a company get more value out of their marketing campaigns by
segmenting customers with different behaviors, optimizing engagement by segment or providing insight to
aid development of personalized ad creative. The results of ad campaigns can often be demonstrated in
sales dashboards.
❖ Heightened employee productivity: Analyzing employee behavior patterns and viewing KPIs in HR
dashboards can lead to strategies for boosting employee engagement and productivity.
❖ Improved customer retention: Understanding customer behavior can improve customer relations, reducing
churn.
❖ Increased cost efficiency: Manufacturing costs, for example, could be lowered through many different
data mining analyses, from insights into supplier pricing behavior to better understanding customer buying
patterns.
❖ Higher product/service quality: Finding and fixing areas where quality falters can decrease product
returns.
Data Mining Use Cases

❖ Banking: Data mining is used to predict successful loan applicants as well as to detect fraud in credit
cards.
❖ Retail: Create effective advertisements based on past responses.
❖ Insurance: Predict probability and costs for future disasters, based on past hurricanes or tornadoes.
❖ Grocery stores: Analyze market baskets to find products usually bought together. Running a sales
promotion on one item can improve sales of the other item at its normal price.
❖ Manufacturing: Implement just-in-time fulfillment by predicting when new supplies should be ordered
or when equipment is likely to fail.
❖ Customer relationship management: Identify characteristics of customers who move to competitors,
then offer special deals to retain other customers with those same characteristics.
❖ Security: Intrusion detection techniques use data mining to identify anomalies that could be network
break-ins.
Data Mining Technology
❖ Classification: Assigns data to multiple categories or classes. For example, a loan applicant can be
assigned to a low, medium or high-risk category. Usually, the categories for the model are predefined
based on previous analysis of the data.
❖ Anomaly detection: A form of classification that uses machine learning to detect data that does not fit a
class. For example, anomaly detection is used to find fraudulent credit card charges.
❖ Clustering: Identifies groups of similar data. For example, clustering can be used to find customers with
similar buying habits.
❖ Association: Generates a probability of multiple events occurring together. One application is “market
basket analysis,” which discovers when two or more items are frequently bought together.
❖ Regression: Using a data set where values are known, regression techniques attempt to predict a value
based on multiple attributes. For example, regression could predict sales based on the advertising dollars,
month, website visits and other financial attributes.
❖ Neural networks: A form of artificial intelligence that mimics the human brain to find relationships in data.
Neural networks have multiple applications, for example, in predicting customer behavior.
Data Mining Process
❖ Define goal: Do you want to learn more about your customers? Do you want to cut manufacturing
costs? Do you want to increase revenue? Do you want to detect fraud? Clearly identify the desired
outcome of data mining implementation to get started.
❖ Gather the data: Data mining can answer all those questions, but each one requires a different set of
data. Often the data comes from multiple databases, for example, customers and orders.
❖ Cleanse the data: Once selected, the data usually needs to be cleansed, reformatted and validated.
❖ Get to know the data: Become familiar with the data by running basic statistical analyses and building
visual graphs and charts.
❖ Build a model: Model building is where the data mining process is most iterative. Analysts choose one
or more of the technology approaches discussed in the next section and apply one or more to the data
being mined.
❖ Validate the results: Whichever techniques are used, examine the results to validate that the findings
are accurate. If not, go back to step above — rebuild the model.
❖ Implement the model: Use the discoveries to fulfill your original business goal.
Drawbacks of Data Mining

❖ Data analytics tools are often complicated to use.


❖ It takes highly trained and skilled personnel to analyze data properly.
❖ It is also complicated to determine which tools should be used.
❖ There are many privacy concerns surrounding data mining.
❖ The information obtained through data mining may not be completely accurate.
Data Mining Vs. Machine Learning

❖ Data mining is analyzing datasets to find useful information. Machine learning


refers to discovering algorithms that have improved with the experience gained
from data.
❖ Data mining was discovered decades before machine learning. Machine
learning, in comparison, is a newer technology.
❖ Data mining works with large amounts of raw data. Machine learning uses
algorithm.
❖ Data mining can only work with human intervention. Experts must be involved
in the data mining process. Machine learning was created to function without
much human intervention. Its algorithms learn from experience and improve
themselves.
Types of data mining tasks

❖ Data mining tasks can be classified into two categories:


➢ Descriptive data mining: It highlights common characteristics without any historical or previous data
input. Examples are count and average.
➢ Predictive data mining: It can predict important business metrics using previously available
information based on the data’s linearity. For example quarter result
Data Mining Functionality

❖ Classification: This functionality categorises data into different classes using trained data
sets. It is commonly used in applications like spam filtering or customer segmentation.
❖ Clustering: Similar to classification but without predefined classes, clustering groups a set
of objects so that objects in the same group are more similar to each other than those in
other groups.
❖ Regression: This method predicts a range of numeric values based on a continuous
dataset, which helps predict sales figures and inventory requirements.
❖ Association Rules: This involves discovering interesting relations between variables in
large databases. For example, identifying products frequently bought together can help in
cross-selling strategies in retail.
❖ Anomaly Detection (Outlier Change Detection): This functionality identifies unusual data
records, which can be helpful in fraud detection by spotting unusual transactions.
Data Mining Functionality

❖ Sequential Patterns: This functionality identifies regular sequences or patterns in data where one
event leads to another. It is helpful in various applications, including web page analysis and
studying purchase patterns.
❖ Decision Trees: A decision tree is a model that uses a tree-like graph of decisions and their
possible consequences. It is used extensively in decision analysis to visually and explicitly
represent decisions and decision-making.
❖ Neural Networks: Inspired by the human brain, neural networks are a series of algorithms that
attempt to recognise underlying relationships in a data set through a process that mimics how the
human brain operates.
❖ Data Visualisation: Turning complex data sets into graphical representations that are easy to
understand and interpret. This functionality helps stakeholders make sense of complicated data
through visual storytelling.
❖ Text Mining: Utilising techniques to extract qualitative information from text data sources. This is
increasingly important as data comes in numbers and texts, requiring deep analytics.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy