0% found this document useful (0 votes)

43 views3 pages

Datamining Metrics

Data mining metrics are parameters used to quantitatively assess and compare data mining methods. They help determine the effectiveness of techniques/algorithms and aid decision making. Metrics fall into categories of accuracy, reliability, and usefulness. Accuracy measures how well models correlate outcomes to attributes, while reliability assesses performance on different data sets. Usefulness indicates whether models provide information to answer business questions. Choosing the appropriate metrics is important for properly evaluating data mining models and approaches.

Uploaded by

Siva Galeti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views3 pages

Datamining Metrics

Uploaded by

Siva Galeti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Data Mining Metrics

Himadri Barman

Data Mining has emerged at the confluence of artificial intelligence, statistics, and databases
as a technique for automatically discovering summary knowledge in large datasets. Data
mining first requires understanding the data available, developing questions to test, and
finally drawing conclusions from data analytic results. Metrics are some parameters or
measures of quantitative assessment used for measurement or comparison in a given
context. A metric for all practical purpose is just a variable. It needs to be clearly defined.
The number of metrics needs to be kept under control to ensure that the measuring task is
achievable. It is thus reasonable to expect that as the context changes, the metrics would
change. Literature has not defined Data mining metrics as such. Data mining metrics may be
defined as a set of measurements which can help in determining the efficacy of a Data
mining Method / Technique or Algorithm. They are important to help take the right decision
as like as choosing the right data mining technique or algorithm.

Data mining comes in two forms. Directed data mining involves searching through historical
records to find patterns that explain a particular outcome and includes the tasks of
classification, estimation, prediction and profiling. Undirected data mining searches through
the same records for interesting patterns. It includes the task of clustering, finding
association rules and description. Data mining models are the key for both. Each type of
model so designed will have its own metrics by which it can be assessed, but there may be
assessment tools that are independent of the type of model. In many cases, a single metric
may not be sufficient to evaluate. In such cases, we might have to look at multiple metrics
which can be used to validate one another and maximize the accuracy of the evaluation.
Choosing the right metrics for the assessment is of paramount importance.

Data mining metrics generally fall into the categories of accuracy, reliability, and usefulness.
Accuracy is a measure of how well the model correlates an outcome with the attributes in
the data that has been provided. There are various measures of accuracy, but all measures
of accuracy are dependent on the data that is used. In reality, values might be missing or
approximate, or the data might have been changed by multiple processes. Particularly in the
phase of exploration and development, we might decide to accept a certain amount of error
in the data, especially if the data is fairly uniform in its characteristics. For example, a model
that predicts sales for a particular store based on past sales can be strongly correlated and
very accurate, even if that store consistently used the wrong accounting method. Therefore,
measurements of accuracy must be balanced by assessments of reliability.

Reliability assesses the way that a data mining model performs on different data sets. A data
mining model is reliable if it generates the same type of predictions or finds the same
general kinds of patterns regardless of the test data that is supplied. For example, the model
that we generate for the store that used the wrong accounting method would not
generalize well to other stores, and therefore would not be reliable.

Usefulness includes various metrics that tell us whether the model provides useful
information. For example, a data mining model that correlates store location with sales
might be both accurate and reliable, but might not be useful, because you cannot generalize

Downloaded from http://himadri.cmsdu.org 1

that result by adding more stores at the same location. Moreover, it does not answer the
fundamental business question of why certain locations have more sales. We might also find
that a model that appears successful in fact is meaningless, because it is based on cross‐
correlations in the data.

Measuring the effectiveness or usefulness of data mining approach is not always

straightforward. In fact, different metrics could be used for different techniques and also
based on the interest level. From an overall business or usefulness perspective, a measure
such as Return on Investment (ROI) could be used. ROI examines the difference between
what the data mining technique costs and what the savings or benefits from its use are. Of
course, this would be difficult to measure because the return is hard to quantify. It could be
measured as increased sales, reduced advertising expenditure, or both. In a specific
advertising campaign implemented via targeted catalog mailings, the percentage of catalog
recipients and the amount of purchase per recipient would provide one means to measure
the effectiveness of the mailings.

There can be a more computer science / database perspective to measure various data
mining approaches. It is assumed that the business management has determined that a
particular data mining application be made. They subsequently will determine the overall
effectiveness of the approach using some ROI (or related – like TCO: Total Cost of
Ownership) strategy. The objective then is to compare different alternatives to
implementing a specific data mining task. The metrics used include the traditional metrics of
space and time based on complexity analysis. In some cases, such as accuracy in
classification, more specific metrics targeted to a data mining task may be used.

Evaluation metrics play a critical role in data mining. Metrics are used to guide the data
mining algorithms and to evaluate the results of data mining. For example, when using a
decision tree algorithm to solve a classification task, information gain may be used to guide
the construction of the decision tree while accuracy may be used to evaluate the
performance of the final tree.

The development of a large number of rule induction and decision tree construction
algorithms for data mining by researchers in machine learning and statistics has seen
empirical evaluation and justification become an important aspect for acceptance of newly
developed algorithms by researchers in the field. To provide a comprehensive evaluation,
a set of standard criteria is needed such as: induction time, size of induction results, time to
execute the induction results, and predicative accuracy. One algorithm may be able to
perform better than others with one criterion, but may perform poorly with other criteria.
With the same set of algorithms, we can also get different evaluation results from different
sets of databases. The question of why, and under which circumstances one algorithm
(whether it is newly designed or an existing one) outperforms others becomes more
important than simply presenting empirical results from an arbitrarily selected set of
databases. Research on data mining metrics is based on the above mentioned, widely
adopted criteria. These metrics also look into the characteristics of the data sets for
experiments such as: the numbers of classes, attributes and examples, the distribution of
training examples in the example space, the level of noise and the mixture of continuous
and nominal values. The aim is to develop a meaningful set of metrics with well documented

Downloaded from http://himadri.cmsdu.org 2

experiment results for different algorithms. These metrics can be used as a test bed for
newly developed algorithms against existing ones. There is now lot of intent in developing
and designing data‐mining metrics that can be used to make predictive models that support
systemic change.

Data mining has now become specialized like those on web data (web mining), spatial data,
etc. With the explosion in web generated data, web mining has found many takers. There
are many web mining metrics, like website visitors, pages served, indegree or queries in a
given time, etc. that can be tracked. Data mining on spatial data has become important due
to the fact that there are huge volumes of spatial data now available holding a wealth of
valuable information. Distance metrics are used to find similar data objects that lead to
develop robust algorithms for the data mining functionalities such as classification and
clustering.

Many modern businesses are data driven. A great deal of effort is spent on using masses of
data to guide decisions at all levels. When data mining algorithms are transferred into the
business community, the technical metrics associated with the algorithm are also
transferred. Practitioners in the business world are then able to evaluate predictive business
models developed with the available technical metrics. Therefore, at the core of these
efforts are metrics. Businesses thus focus on producing timely, correct and relevant metrics
that help them in their operations. Data mining metrics should be directly proportional to
the improvement in the data mining operations. Since, a large number of data mining
metrics are there, they should be selected with caution. The data mining metrics needs to
be clearly defined and avoid any kind of ambiguity in interpretation. It is believed that data
mining metrics should be flexible enough to meet changing needs and requirements. In an
interesting conclusion, it is worthwhile to mention that data mining metrics has become a
niche field with many top IT consultants to give advices/suggestions. It can become a career
for many!

References:
• Data Mining: Introductory and Advanced Topics by Margaret H Dunham
• Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management by
Michael J. Berry, Gordon S. Linof
• Fast Distance Metric Based Data Mining Techniques Using P‐trees: k‐Nearest‐Neighbor
Classification and k‐Clustering – A Thesis submitted by Md Abdul Maleq Khan
• CFP: A Special Issue of Informatica On Data Mining Metrics – a forward note by Dr. X
D Wu
• Mining with Rarity: A Unifying Framework by Gary M. Weiss
• Knowledge Discovery and Data Mining: Challenges and Realities by Xingquan Zhu, Ian
Davidson
• http://msdn.microsoft.com/en‐us/library/ms174493.aspx accessed on September
10, 2012 at 0825 hrs

Downloaded from http://himadri.cmsdu.org 3

Data Mining 1
No ratings yet
Data Mining 1
166 pages
Kresta - KPMG Independent Expert Report (Pages 39 To 96)
No ratings yet
Kresta - KPMG Independent Expert Report (Pages 39 To 96)
96 pages
BI-Unit-3-Part-1-PPT.ppt
No ratings yet
BI-Unit-3-Part-1-PPT.ppt
51 pages
Auditing An International Approach 8th Edition Smieliauskas Solutions Manualpdf download
100% (2)
Auditing An International Approach 8th Edition Smieliauskas Solutions Manualpdf download
42 pages
27th Annual Report and Annual Accounts For 2019 20 - Trust
No ratings yet
27th Annual Report and Annual Accounts For 2019 20 - Trust
42 pages
Format of Contract For 21 Months Training
No ratings yet
Format of Contract For 21 Months Training
8 pages
Definition
No ratings yet
Definition
11 pages
Bank Rakyat Indonesia (Persero) TBK
No ratings yet
Bank Rakyat Indonesia (Persero) TBK
3 pages
Case Study 5 On The Job - Strategy Case of
No ratings yet
Case Study 5 On The Job - Strategy Case of
12 pages
Ghx Europe Ghx Provider Overview Brochure En
No ratings yet
Ghx Europe Ghx Provider Overview Brochure En
13 pages
GST at 6 - All Key Observations by SC and HC - 5 June 2023 - CA Pritam Mahure and Asso.
No ratings yet
GST at 6 - All Key Observations by SC and HC - 5 June 2023 - CA Pritam Mahure and Asso.
346 pages
Identity Verification Made Simple - Powered by ID-Pal
No ratings yet
Identity Verification Made Simple - Powered by ID-Pal
12 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Data Insights: The Science of Data Analysis
From Everand
Data Insights: The Science of Data Analysis
Lexa N. Palmer
No ratings yet
The Complete Guide to Data Warehousing
From Everand
The Complete Guide to Data Warehousing
Pasquale De Marco
No ratings yet
CYBER SECURITY TERM PAPER
No ratings yet
CYBER SECURITY TERM PAPER
17 pages
Costruzioni Metalliche - La Rivista Del Collegio Dei Tecnici Dell'acciaio
No ratings yet
Costruzioni Metalliche - La Rivista Del Collegio Dei Tecnici Dell'acciaio
6 pages
Project2 (City, Town, Village)
No ratings yet
Project2 (City, Town, Village)
7 pages
Transport Allowance For Salaried Employees
No ratings yet
Transport Allowance For Salaried Employees
5 pages
The Apology Impulse Cooper en 38053
No ratings yet
The Apology Impulse Cooper en 38053
6 pages
Beyond The Algorithm: Practical Machine Learning Strategies
From Everand
Beyond The Algorithm: Practical Machine Learning Strategies
Jane Onwuchekwa
No ratings yet
Business Data Analytics with Microsoft Excel
From Everand
Business Data Analytics with Microsoft Excel
Pasquale De Marco
No ratings yet
Data Mining
No ratings yet
Data Mining
43 pages
Google Case
No ratings yet
Google Case
2 pages
Bip 0129-2011
No ratings yet
Bip 0129-2011
205 pages
Data Analysis: An In-depth Insight
From Everand
Data Analysis: An In-depth Insight
Pasquale De Marco
No ratings yet
Flight Search Results - United Airlines
No ratings yet
Flight Search Results - United Airlines
1 page
9th Maths EM Half Yearly Exam 2022 Original Question Paper Mayiladuthurai District English Medium PDF Download
No ratings yet
9th Maths EM Half Yearly Exam 2022 Original Question Paper Mayiladuthurai District English Medium PDF Download
2 pages
2020 BCOM Holiday Schedule
No ratings yet
2020 BCOM Holiday Schedule
2 pages
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
No ratings yet
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
5 pages
A Project Report On Amuls Supply Chain M
No ratings yet
A Project Report On Amuls Supply Chain M
26 pages
Fixed Asset and Depreciation Schedule: Instructions: Inputs
No ratings yet
Fixed Asset and Depreciation Schedule: Instructions: Inputs
5 pages
Maglaya Mark Wesley M. - Module-Answers
No ratings yet
Maglaya Mark Wesley M. - Module-Answers
13 pages
Intermediate Microeconomics and Its Application 12th Edition Nicholson Snyder Test Bank
100% (52)
Intermediate Microeconomics and Its Application 12th Edition Nicholson Snyder Test Bank
7 pages
KAJIAN HUKUM TENTANG BARANG MODAL KCIC DALAM MASTER LIST YANG DIKENAKAN BEA MASUK (English Version)
No ratings yet
KAJIAN HUKUM TENTANG BARANG MODAL KCIC DALAM MASTER LIST YANG DIKENAKAN BEA MASUK (English Version)
3 pages
Cia 2
No ratings yet
Cia 2
7 pages
Performance Evaluation: Queues and Markov
From Everand
Performance Evaluation: Queues and Markov
Pasquale De Marco
No ratings yet
Becoming a Data Analyst: Skills, Tools, and Real-World Strategies
From Everand
Becoming a Data Analyst: Skills, Tools, and Real-World Strategies
Othman Khalifa
No ratings yet
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
EViews 12 Academic Volume License Pricing - FINAL - 2020-11
No ratings yet
EViews 12 Academic Volume License Pricing - FINAL - 2020-11
1 page
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
FH Moms Civic Engagement
No ratings yet
FH Moms Civic Engagement
3 pages
Decision Making with Data
From Everand
Decision Making with Data
Ravi Deshpande
No ratings yet
Retail Data Analytics: Enhancing Customer Experience and Profitability
From Everand
Retail Data Analytics: Enhancing Customer Experience and Profitability
Christine Nyaga
No ratings yet
Patent Cases
No ratings yet
Patent Cases
7 pages
PYTHON FOR DATA ANALYSIS: A Practical Guide to Manipulating, Cleaning, and Analyzing Data Using Python (2023 Beginner Crash Course)
From Everand
PYTHON FOR DATA ANALYSIS: A Practical Guide to Manipulating, Cleaning, and Analyzing Data Using Python (2023 Beginner Crash Course)
Ike Beck
No ratings yet
Data-Driven Decision Making
From Everand
Data-Driven Decision Making
Aadinath Pothuvaal
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Data Mining 101: Core Concepts and Algorithms
From Everand
Data Mining 101: Core Concepts and Algorithms
Swarnalata Verma
No ratings yet
Get Hired as a Data Analyst FAST in 2024
From Everand
Get Hired as a Data Analyst FAST in 2024
Silas Meadowlark
No ratings yet
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
Excel Data Mastery for Beginners
From Everand
Excel Data Mastery for Beginners
Kevogo Musudia
No ratings yet
Tables: Equz Iloan Table: ILOA (Pass ILOAN) TPLNR
No ratings yet
Tables: Equz Iloan Table: ILOA (Pass ILOAN) TPLNR
5 pages
Business Analytics and Big Data
From Everand
Business Analytics and Big Data
Sachin Naha
No ratings yet
Business Intelligence and Data Mining Techniques
From Everand
Business Intelligence and Data Mining Techniques
Dwaipayan Sethi
No ratings yet
What Is Data Analytics? A Complete Guide For Beginners
From Everand
What Is Data Analytics? A Complete Guide For Beginners
Piyush Kumar Jain
No ratings yet
Introduction to Data Analytics
From Everand
Introduction to Data Analytics
Dan Martin
No ratings yet
From Data To Decisions: Driving Performance in the Age of Analytics
From Everand
From Data To Decisions: Driving Performance in the Age of Analytics
Babatunde Yusuf
No ratings yet
Data Analytics for Beginners: Introduction to Data Analytics
From Everand
Data Analytics for Beginners: Introduction to Data Analytics
Anthony S. Williams
4/5 (19)
Marketing Analytics: How to Achieve Success, #1
From Everand
Marketing Analytics: How to Achieve Success, #1
Ricardo Moreno
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Data Science Project Ideas for Thesis, Term Paper, and Portfolio
From Everand
Data Science Project Ideas for Thesis, Term Paper, and Portfolio
Zemelak Goraga
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
From Everand
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
Business Analytics
From Everand
Business Analytics
Hiriyappa .B
5/5 (1)
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
From Everand
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
Jamie Murphy
No ratings yet
Introduction to Business Analytics
From Everand
Introduction to Business Analytics
Dwaipayan Sethi
No ratings yet
Business Analytics
From Everand
Business Analytics
Hiriyappa .B, Ph.D.
5/5 (1)
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
The Definitive Guide to IT Service Metrics
From Everand
The Definitive Guide to IT Service Metrics
Kurt McWhirter
4.5/5 (2)
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
No ratings yet
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Data Conversion: Calculating the Monetary Benefits
From Everand
Data Conversion: Calculating the Monetary Benefits
Patricia Pulliam Phillips
No ratings yet
Making Big Data Work for Your Business: A guide to effective Big Data analytics
From Everand
Making Big Data Work for Your Business: A guide to effective Big Data analytics
Sudhi Sinha
No ratings yet
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Big Data: Understanding How Data Powers Big Business
From Everand
Big Data: Understanding How Data Powers Big Business
Bill Schmarzo
2/5 (1)
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
From Everand
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
alasdair gilchrist
No ratings yet
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Analytics and Big Data for Accountants
From Everand
Analytics and Big Data for Accountants
Jim Lindell
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Datamining Metrics

Uploaded by

Datamining Metrics

Uploaded by

Data Mining Metrics

Downloaded from http://himadri.cmsdu.org 1

Measuring the effectiveness or usefulness of data mining approach is not always

Downloaded from http://himadri.cmsdu.org 2

Downloaded from http://himadri.cmsdu.org 3

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.