0% found this document useful (0 votes)

553 views18 pages

Data Mining Task Primitives and Major Issues

Uploaded by

Chaitali Nagbhidkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

553 views18 pages

Data Mining Task Primitives and Major Issues

Uploaded by

Chaitali Nagbhidkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Data Mining Task Primitives

• We can specify a data mining task in the form of a data mining query.
• This query is input to the system.
• A data mining query is defined in terms of data mining primitives.
• These primitives allows us to communicate in an interactive manner with
the data mining system- ,
• - during discovery to direct the mining process or examine the findings

from different angles or depths.

• Designing a comprehensive data mining language is challenging
because data mining covers a wide spectrum of tasks, from data
characterization to evolution analysis.

• Each task has different requirements.

• The design of an effective data mining query language requires a deep

understanding of the power, limitation, and underlying mechanisms of
the various kinds of data mining tasks.

• This facilitates a data mining system's communication with other

information systems and integrates with the overall information
processing environment.
The data mining
primitives specify
the following,
Task Primitives
1. Set of task-relevant data to be mined.
2. Kind of knowledge to be mined.
3. Background knowledge to be used in the discovery
process.
4. Interestingness measures and thresholds for pattern
evaluation.
5. Representation for visualizing the discovered patterns
1. The set of task-relevant data to be mined

• This specifies the portions of the database or the set of data in which the user is
interested.
• This includes the database attributes or data warehouse dimensions of interest
(the relevant attributes or dimensions).
• In a relational database, the set of task-relevant data can be collected via a
relational query involving operations like selection, projection, join, and
aggregation.
2. The kind of knowledge to be mined

• This specifies the data mining functions to be performed, such as

characterization,
• discrimination,
• association or correlation analysis,
• classification,
• prediction,
• clustering,
• outlier analysis, or
• evolution analysis.
3. The background knowledge to be used in the discovery process

• This knowledge about the domain to be mined is useful for guiding the
knowledge discovery process and evaluating the patterns found.
• Concept hierarchies are a popular form of background knowledge, which
allows data to be mined at multiple levels of abstraction.
• Concept hierarchy defines a sequence of mappings from low-level concepts
to higher-level, more general concepts.

Rolling Up - Generalization of data: Allow to view data at more meaningful and

explicit abstractions and makes it easier to understand. It compresses the data,
and it would require fewer input/output operations.

Drilling Down - Specialization of data: Concept values replaced by lower-level

concepts. Based on different user viewpoints, there may be more than one
concept hierarchy for a given attribute or dimension.
An example of a concept hierarchy for the attribute (or dimension) age is shown
below. User beliefs regarding relationships in the data are another form of
background knowledge.
4. The interestingness measures and thresholds for pattern evaluation

• Different kinds of knowledge may have different interesting measures. They

may be used to guide the mining process or, after discovery, to evaluate the
discovered patterns.
• For example, interesting measures for association rules include support and
confidence. Rules whose support and confidence values are below user-
specified thresholds are considered uninteresting.
Simplicity:
Certainty (Confidence):
Utility (Support):
Novelty:
Novel patterns are those that contribute new information or increased
performance to the given pattern set. For example -> A data exception. Another
strategy for detecting novelty is to remove redundant patterns
5. The expected representation for visualizing the discovered patterns

• This refers to the form in which discovered patterns are to be displayed,

which may include rules, tables, cross tabs, charts, graphs, decision
trees, cubes, or other visual representations.
• Users must be able to specify the forms of presentation to be used for
displaying the discovered patterns.
• Some representation forms may be better suited than others for
particular kinds of knowledge.

For example, generalized relations and their corresponding cross tabs or

pie/bar charts are good for presenting characteristic descriptions, whereas
decision trees are common for classification.
Example of Data Mining Task Primitives

Suppose, as a marketing manager of AllElectronics, you would like to classify

customers based on their buying patterns. You are especially interested in
those customers whose salary is no less than $40,000 and who have bought
more than $1,000 worth of items, each of which is priced at no less than $100.

In particular, you are interested in the customer's age, income, the types of
items purchased, the purchase location, and where the items were made. You
would like to view the resulting classification in the form of rules.

This data mining query is expressed in DMQL3 (Data Mining Query language)
as follows, where each line of the query has been enumerated to aid in our
discussion.
This data mining query is expressed in DMQL3 (Data Mining Query
language) as follows,

use database AllElectronics_db

use hierarchy location_hierarchy for T.branch, age_hierarchy for C.age
mine classification as promising_customers
in relevance to C.age, C.income, I.type, I.place_made, T.branch
from customer C, an item I, transaction T
where I.item_ID = T.item_ID and C.cust_ID = T.cust_ID and C.income ≥ 40,000
and I.price ≥ 100
group by T.cust_ID
Data Mining Issues..
1. Mining Methodology and User Interaction Issues
It refers to the following kinds of issues −

• Mining different kinds of knowledge in databases − Different users may be

interested in different kinds of knowledge. Therefore it is necessary for data
mining to cover a broad range of knowledge discovery task.
• Interactive mining of knowledge at multiple levels of abstraction − The data
mining process needs to be interactive because it allows users to focus the
search for patterns, providing and refining data mining requests based on the
returned results.

• Incorporation of background knowledge − To guide discovery process and to

express the discovered patterns, the background knowledge can be used.
Background knowledge may be used to express the discovered patterns not
only in concise terms but at multiple levels of abstraction.
1. Mining Methodology and User Interaction Issues cont.
It refers to the following kinds of issues −

Data mining query languages and ad hoc data mining − Data Mining Query language
that allows the user to describe ad hoc mining tasks, should be integrated with a data
warehouse query language and optimized for efficient and flexible data mining.

Presentation and visualization of data mining results − Once the patterns are
discovered it needs to be expressed in high level languages, and visual representations.
These representations should be easily understandable.

Handling noisy or incomplete data − The data cleaning methods are required to handle
the noise and incomplete objects while mining the data regularities. If the data cleaning
methods are not there then the accuracy of the discovered patterns will be poor.

Pattern evaluation − The patterns discovered should be interesting because either they
represent common knowledge or lack novelty.
2. Performance Issues

There can be performance-related issues such as follows −

Efficiency and scalability of data mining algorithms − In order to effectively
extract the information from huge amount of data in databases, data mining
algorithm must be efficient and scalable.
Parallel, distributed, and incremental mining algorithms − The factors such as
huge size of databases, wide distribution of data, and complexity of data mining
methods motivate the development of parallel and distributed data mining
algorithms. These algorithms divide the data into partitions which is further
processed in a parallel fashion. Then the results from the partitions is merged.
The incremental algorithms, update databases without mining the data again
from scratch.
3. Diverse Data Types Issues

Handling of relational and complex types of data − The

database may contain complex data objects, multimedia data
objects, spatial data(geographic or geospatial data) , temporal
data (Data that specifically refers to times or dates) etc. It is not
possible for one system to mine all these kind of data.

Mining information from heterogeneous databases and global

information systems − The data is available at different data
sources on LAN or WAN. These data source may be structured,
semi structured or unstructured. Therefore mining the
knowledge from them adds challenges to data mining.

U1 - Data Mining Task Primitives
No ratings yet
U1 - Data Mining Task Primitives
4 pages
Ad3391 LAB MANUAL
No ratings yet
Ad3391 LAB MANUAL
23 pages
Big Data Analytics - Unit 4
No ratings yet
Big Data Analytics - Unit 4
32 pages
MSC Datascience Unit1
No ratings yet
MSC Datascience Unit1
20 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
No ratings yet
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
111 pages
DBMS-unit 5-Database Security
No ratings yet
DBMS-unit 5-Database Security
13 pages
BI UNIT-II Chp01 (Mathematical Models For Decision Making)
No ratings yet
BI UNIT-II Chp01 (Mathematical Models For Decision Making)
9 pages
Syllabus BCA 6th Sem
100% (1)
Syllabus BCA 6th Sem
4 pages
Data Analytics Question Bank
No ratings yet
Data Analytics Question Bank
4 pages
R22 - IT - Python Programming Lab Manual
No ratings yet
R22 - IT - Python Programming Lab Manual
96 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
APP Question Bank Unit3
100% (1)
APP Question Bank Unit3
5 pages
Data Mining Issues and Tasks
No ratings yet
Data Mining Issues and Tasks
5 pages
Natural Language Processing
No ratings yet
Natural Language Processing
38 pages
Compact Representation of Frequent Item Set
No ratings yet
Compact Representation of Frequent Item Set
59 pages
Lecture 6 Data Preprocessing
No ratings yet
Lecture 6 Data Preprocessing
59 pages
Ai Notes
No ratings yet
Ai Notes
68 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
DWDM UNIT-1 Lecture Notes
No ratings yet
DWDM UNIT-1 Lecture Notes
15 pages
S1 CS - U4 Data Ranges - Frequencies - Shifting
No ratings yet
S1 CS - U4 Data Ranges - Frequencies - Shifting
24 pages
SCT - QB - Anwers - p1
No ratings yet
SCT - QB - Anwers - p1
53 pages
Unit 5
No ratings yet
Unit 5
104 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Introduction To AI & ML QUESTION BANK MODULEWISE
No ratings yet
Introduction To AI & ML QUESTION BANK MODULEWISE
3 pages
DBMS Unit4 Notes
No ratings yet
DBMS Unit4 Notes
14 pages
Research Paper A Study On Marketing Strategy of Amul Milk
No ratings yet
Research Paper A Study On Marketing Strategy of Amul Milk
25 pages
18CS72
No ratings yet
18CS72
2 pages
Republic of The Philippines Tarlac State University College of Criminal Justice Education Criminology Department
No ratings yet
Republic of The Philippines Tarlac State University College of Criminal Justice Education Criminology Department
45 pages
IML-IITKGP - Assignment 7 Solution
No ratings yet
IML-IITKGP - Assignment 7 Solution
8 pages
UNIT-III Data Warehouse and Minig Notes MDU
No ratings yet
UNIT-III Data Warehouse and Minig Notes MDU
42 pages
Assignment On Data
100% (1)
Assignment On Data
8 pages
System Bus in Computer Architecture: Goran Wnis Hama Ali
No ratings yet
System Bus in Computer Architecture: Goran Wnis Hama Ali
34 pages
Detailed University Schema: Appendix
No ratings yet
Detailed University Schema: Appendix
2 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
19 pages
Mfcs PPT (All Units)
No ratings yet
Mfcs PPT (All Units)
103 pages
Programming Language Design Issues
No ratings yet
Programming Language Design Issues
47 pages
ADBMS Sem 1 Mumbai University (MSC - CS)
No ratings yet
ADBMS Sem 1 Mumbai University (MSC - CS)
39 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Data Mining - Discretization
100% (1)
Data Mining - Discretization
5 pages
Dbms Unit 4.2
No ratings yet
Dbms Unit 4.2
60 pages
Data Mining Query Language
0% (1)
Data Mining Query Language
7 pages
Unit 4 - Software Engineering - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Software Engineering - WWW - Rgpvnotes.in
12 pages
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
No ratings yet
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
5 pages
ML Question Bank - Beena Kapadia
No ratings yet
ML Question Bank - Beena Kapadia
3 pages
Unit 1 (DMW)
No ratings yet
Unit 1 (DMW)
53 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Lab 1: Preprocessing Using Python
No ratings yet
Lab 1: Preprocessing Using Python
5 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
Question Bank (Unit I) Cs6402-Design and Analysis of Algorithms Part - A
No ratings yet
Question Bank (Unit I) Cs6402-Design and Analysis of Algorithms Part - A
12 pages
Rayleigh Model
No ratings yet
Rayleigh Model
9 pages
Big Data Notes
No ratings yet
Big Data Notes
4 pages
Cyber Security Seminar Brochure
No ratings yet
Cyber Security Seminar Brochure
4 pages
Data Mining
No ratings yet
Data Mining
22 pages
DWDM Important Questions
No ratings yet
DWDM Important Questions
2 pages
Ec 467 Pattern Recognition
No ratings yet
Ec 467 Pattern Recognition
2 pages
Mining Class Comparisions and Mining Descriptive Statistical Measures
No ratings yet
Mining Class Comparisions and Mining Descriptive Statistical Measures
24 pages
Obiee
No ratings yet
Obiee
11 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Sap Hana:: OLTP: Simple Queries Like INSERT, UPDATE, DELETE Etc
No ratings yet
Sap Hana:: OLTP: Simple Queries Like INSERT, UPDATE, DELETE Etc
6 pages
Dataloggers Configuration Manual - v1.1
No ratings yet
Dataloggers Configuration Manual - v1.1
26 pages
Synopsis: Stock Agent - A Java Stock Market Trading Program
No ratings yet
Synopsis: Stock Agent - A Java Stock Market Trading Program
27 pages
OpenText InfoArchive
No ratings yet
OpenText InfoArchive
14 pages
Getz Pharma
No ratings yet
Getz Pharma
11 pages
Issues in Knowledge Acquisition
No ratings yet
Issues in Knowledge Acquisition
8 pages
Big Data Analytics - AAM - Unit 1
No ratings yet
Big Data Analytics - AAM - Unit 1
178 pages
Achieve 3000 Student Data 2021-22
No ratings yet
Achieve 3000 Student Data 2021-22
6 pages
CS507 Information System Short Notes
No ratings yet
CS507 Information System Short Notes
56 pages
SCORE Oracle v3.1
No ratings yet
SCORE Oracle v3.1
149 pages
Global Big Data Analytics Market
No ratings yet
Global Big Data Analytics Market
106 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
4 pages
DBMS Practical Slips
No ratings yet
DBMS Practical Slips
2 pages
Modul Biology Juj 2007
100% (1)
Modul Biology Juj 2007
218 pages
Ism Unit 2 Notes
No ratings yet
Ism Unit 2 Notes
42 pages
Three Schema Arch 1
No ratings yet
Three Schema Arch 1
2 pages
Stochastic Modeling: A Thorough Guide To Evaluate, Pre-Process, Model and Compare Time Series With MATLAB Software First Edition Hossein Bonakdari
No ratings yet
Stochastic Modeling: A Thorough Guide To Evaluate, Pre-Process, Model and Compare Time Series With MATLAB Software First Edition Hossein Bonakdari
49 pages
Exercise 1.
No ratings yet
Exercise 1.
2 pages
Rest A 00288
No ratings yet
Rest A 00288
15 pages
02 - Basic Data Warehousing & Architectures
No ratings yet
02 - Basic Data Warehousing & Architectures
51 pages
A Systematic Literature Review On Social Media Slang Analytics in Contemporary Discourse
No ratings yet
A Systematic Literature Review On Social Media Slang Analytics in Contemporary Discourse
15 pages
Summer Internship Report On Shane International
No ratings yet
Summer Internship Report On Shane International
54 pages
Annals 2021 2 27
No ratings yet
Annals 2021 2 27
6 pages
Tejas
No ratings yet
Tejas
13 pages
Cacti Graph Template JVM Jvmmempooltable - XML
No ratings yet
Cacti Graph Template JVM Jvmmempooltable - XML
19 pages
Proposal Letter of Sunil
No ratings yet
Proposal Letter of Sunil
7 pages
ID Persepsi Siswa Tentang Perilaku Sosial Dalam Pacaran Studi Kasus Siswa Sma Al Is
No ratings yet
ID Persepsi Siswa Tentang Perilaku Sosial Dalam Pacaran Studi Kasus Siswa Sma Al Is
11 pages
LIRAS Brochure
No ratings yet
LIRAS Brochure
4 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining Task Primitives and Major Issues

Uploaded by

Data Mining Task Primitives and Major Issues

Uploaded by

Data Mining Task Primitives

from different angles or depths.

• Each task has different requirements.

• The design of an effective data mining query language requires a deep

• This facilitates a data mining system's communication with other

• This specifies the data mining functions to be performed, such as

Rolling Up - Generalization of data: Allow to view data at more meaningful and

Drilling Down - Specialization of data: Concept values replaced by lower-level

• Different kinds of knowledge may have different interesting measures. They

• This refers to the form in which discovered patterns are to be displayed,

For example, generalized relations and their corresponding cross tabs or

Suppose, as a marketing manager of AllElectronics, you would like to classify

use database AllElectronics_db

• Mining different kinds of knowledge in databases − Different users may be

• Incorporation of background knowledge − To guide discovery process and to

There can be performance-related issues such as follows −

Handling of relational and complex types of data − The

Mining information from heterogeneous databases and global

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.