Data Mining Task Primitives and Major Issues
Data Mining Task Primitives and Major Issues
• We can specify a data mining task in the form of a data mining query.
• This query is input to the system.
• A data mining query is defined in terms of data mining primitives.
• These primitives allows us to communicate in an interactive manner with
the data mining system- ,
• - during discovery to direct the mining process or examine the findings
• This specifies the portions of the database or the set of data in which the user is
interested.
• This includes the database attributes or data warehouse dimensions of interest
(the relevant attributes or dimensions).
• In a relational database, the set of task-relevant data can be collected via a
relational query involving operations like selection, projection, join, and
aggregation.
2. The kind of knowledge to be mined
• This knowledge about the domain to be mined is useful for guiding the
knowledge discovery process and evaluating the patterns found.
• Concept hierarchies are a popular form of background knowledge, which
allows data to be mined at multiple levels of abstraction.
• Concept hierarchy defines a sequence of mappings from low-level concepts
to higher-level, more general concepts.
In particular, you are interested in the customer's age, income, the types of
items purchased, the purchase location, and where the items were made. You
would like to view the resulting classification in the form of rules.
This data mining query is expressed in DMQL3 (Data Mining Query language)
as follows, where each line of the query has been enumerated to aid in our
discussion.
This data mining query is expressed in DMQL3 (Data Mining Query
language) as follows,
Data mining query languages and ad hoc data mining − Data Mining Query language
that allows the user to describe ad hoc mining tasks, should be integrated with a data
warehouse query language and optimized for efficient and flexible data mining.
Presentation and visualization of data mining results − Once the patterns are
discovered it needs to be expressed in high level languages, and visual representations.
These representations should be easily understandable.
Handling noisy or incomplete data − The data cleaning methods are required to handle
the noise and incomplete objects while mining the data regularities. If the data cleaning
methods are not there then the accuracy of the discovered patterns will be poor.
Pattern evaluation − The patterns discovered should be interesting because either they
represent common knowledge or lack novelty.
2. Performance Issues