0% found this document useful (0 votes)
2 views5 pages

Data Mining Unit-III

The document discusses key concepts in data mining, focusing on concept description, frequent pattern mining, and association rules. It outlines the differences between concept description and OLAP, defines frequent patterns, and explains data generalization and summarization. Additionally, it covers attribute relevance, methods for class comparison, and the measurement of rule quality in association mining.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

Data Mining Unit-III

The document discusses key concepts in data mining, focusing on concept description, frequent pattern mining, and association rules. It outlines the differences between concept description and OLAP, defines frequent patterns, and explains data generalization and summarization. Additionally, it covers attribute relevance, methods for class comparison, and the measurement of rule quality in association mining.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

DATA MINING (UNIT-III)

1. What is concept description of Data mining.


Ans. Concept Description is a definitive type of data mining. It defines a
set of data including frequent buyers, graduate candidates, etc. It
describes the characterization and comparison of the data. It is also
known as a class description when the concept to be described is
defined as a class of objects. These descriptions can be determined with
the support of data characterization.
2.State the difference between Concept description and OLAP.
ANS. the comparison between concept descriptions in large databases and OLAP
tools.

Concept description in large OLAP tools


databases

The database attributes can be of The data warehouses and OLAP tools are
several types, such as numeric, non- established on a multidimensional data model
numeric, spatial, text, or image. that views the data in the form of a data cube,
making attributes and measuring and
constraining dimensions to non-numeric data.

With aggregation, concept descriptions OLAP defines a simplified model for data
in databases can manage complex data analysis, because of its condition on the
types of the attributes. possible dimension and measure types.

Concept description in data mining OLAP in data warehouses is a simply user-


needed a more automated process that controlled process. The selection of
supports users to decide which dimensions and the application of OLAP
attributes should be included in the operations, including drill-down, roll-up,
analysis, and the degree to which given slicing, and dicing are supervised and
data should be generalized to make an controlled by the users. In OLAP, users are
interesting summarization of the data. required to define a long series of OLAP
operations.

3.Define frequent pattern in mining. State the advantages and disadvantages.


Ans. Frequent pattern mining in data mining is the process of identifying
patterns or associations within a dataset that occur frequently. This is
typically done by analyzing large datasets to find items or sets of items that
appear together frequently.
There are several different algorithms used for frequent pattern mining,
including:
1. Apriori algorithm: This is one of the most commonly used
algorithms for frequent pattern mining. It uses a “bottom-up”
approach to identify frequent itemsets and then generates
association rules from those itemsets.
2. ECLAT algorithm: This algorithm uses a “depth-first search”
approach to identify frequent itemsets. It is particularly efficient for
datasets with a large number of items.
3. FP-growth algorithm: This algorithm uses a “compression”
technique to find frequent patterns efficiently. It is particularly
efficient for datasets with a large number of transactions.
4. Frequent pattern mining has many applications, such as Market
Basket Analysis, Recommender Systems, Fraud Detection, and
many more.

Advantages:

1. It can find useful information which is not visible in simple data


browsing
2. It can find interesting association and correlation among data items

Disadvantages:

1. It can generate a large number of patterns


2. With high dimensionality, the number of patterns can be very large,
making it difficult to interpret the results.

4.What is association and correlation in data mining?


Ans. Association is a very general relationship: one variable provides
information about another. Correlation is more specific: two variables are
correlated when they display an increasing or decreasing trend. For example, in
an increasing trend, observing that X > μX implies that it is more likely that Y >
μY.

5.What is data generalization and summarization?


Ans. Data generalization is the process that abstracts a large set of task-
relevant data in a database from a low conceptual level to higher ones.
It is a summarization of general features of objects in a target class and
produces what is called characteristic rules.
There are two basic approaches of data generalization :
1. Data cube approach :
 It is also known as OLAP approach.
 It is an efficient approach as it is helpful to make the past selling
graph.
 In this approach, computation and results are stored in the Data
cube.
2. Attribute oriented induction :
 It is an online data analysis, query oriented and generalization
based approach.
 It performs off-line aggregation before an OLAP or data mining query
is submitted for processing.

6.What is attribute relevance? State the reason for attribute


relevance.
Ans. The basic concept behind attribute relevance analysis is to evaluate some
measure that can compute the relevance of an attribute regarding a given class or
concept. Such measures involve information gain, ambiguity, and correlation
coefficient.
Attribute relevance analysis for concept description is implemented as follows −
Data collection − It can collect data for both the target class and the contrasting
class by query processing.
Preliminary relevance analysis using conservative AOI − This step recognizes a
set of dimensions and attributes on which the selected relevance measure is to be
used.
Remove − This process removes irrelevant and weakly relevant attributes using the
selected relevance analysis measure.
Generate the concept description using AOI − It can implement AOI using a less
conservative set of attribute generalization thresholds.
There are several reasons for attribute relevance analysis are as follows −
 It can decide which dimensions must be included.
 It can produce a high level of generalization.
 It can reduce the number of attributes that support us to read patterns
easily.

7.What are the methods for class comparison?


Ans. There are several procedures which is as follows −
 Data collection − The set of relevant records in the database is
collected by query processing and is separate accordingly into a target
class and one or a set of contrasting classes.
 Dimension relevance analysis − If there are several dimensions, then
dimension relevance analysis must be implemented on these classes to
choose only the highly relevant dimensions for more analysis.
 Synchronous generalization − Generalization is implemented on the
target class to the level managed by a user-or professional-specified
dimension threshold, which outcomes in a prime target class relation.
 Presentation of the derived comparison − The resulting class
comparison description can be anticipated in the form of tables, graphs,
and rules. This presentation generally involves a “contrasting”.

8.State the basic concept of scalable frequent item set mining


methods.
Ans. Frequent item sets, also known as association rules, are a
fundamental concept in association rule mining, which is a technique used in
data mining to discover relationships between items in a dataset. The goal of
association rule mining is to identify relationships between items in a dataset
that occur frequently together.

9.What is association rule? State the various kind association


rules.

Ans. Association Mining searches for frequent items in the data set. In
frequent mining usually, interesting associations and correlations between
item sets in transactional and relational databases are found. In short,
Frequent Mining shows which items appear together in a transaction or
relationship.
There are various types of association rules in data mining:-
1. Multi-relational association rules: Multi-Relation Association Rules
(MRAR) is a new class of association rules, different from original, simple,
and even multi-relational association rules (usually extracted from multi-
relational databases), each rule element consists of one entity but many a
relationship. These relationships represent indirect relationships between
entities.
2. Generalized association rules: Generalized association rule extraction is
a powerful tool for getting a rough idea of interesting patterns hidden in data.
However, since patterns are extracted at each level of abstraction, the mined
rule sets may be too large to be used effectively for decision-making.
3. Quantitative association rules: Quantitative association rules is a
special type of association rule. Unlike general association rules, where both
left and right sides of the rule should be categorical (nominal or discrete)
attributes, at least one attribute (left or right) of quantitative association rules
must contain numeric attributes

10. Explain association mining to corelation analysis.


Ans. Most association rule mining algorithms employ a support-confidence
framework. Often, many interesting rules can be found using low support
thresholds.

Strong Rules Are Not Necessarily Interesting: An Example

Whether or not a rule is interesting can be assessed either subjectively


or objectively. Ultimately, only the user can judge if a given rule is
interesting, and this judgment, being subjective, may differ from one user
to another. However, objective interestingness measures, based on the
statistics “behind” the data, can be used as one step toward the goal of
weeding out uninteresting rules from presentation to the user.

11. How to measure the quality of rules?

Ans. There are three steps for measuring data quality. 1) Extract all
association rules. 2) Select compatible association rules. 3) Add
confidence factor of compatible rules as criteria of data quality of
transaction.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy