0% found this document useful (0 votes)
3 views5 pages

Data Mining Unit-i

Data mining is the process of extracting insights from large datasets using various techniques, aimed at discovering hidden patterns for informed decision-making. It encompasses functionalities like data characterization, discrimination, association analysis, and classification, and can be classified based on databases, knowledge types, techniques, and applications. The Knowledge Discovery in Databases (KDD) process involves steps such as selection, pre-processing, transformation, data mining, interpretation, evaluation, and deployment, with advantages including improved decision-making and efficiency, but also facing challenges like privacy concerns and data quality issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views5 pages

Data Mining Unit-i

Data mining is the process of extracting insights from large datasets using various techniques, aimed at discovering hidden patterns for informed decision-making. It encompasses functionalities like data characterization, discrimination, association analysis, and classification, and can be classified based on databases, knowledge types, techniques, and applications. The Knowledge Discovery in Databases (KDD) process involves steps such as selection, pre-processing, transformation, data mining, interpretation, evaluation, and deployment, with advantages including improved decision-making and efficiency, but also facing challenges like privacy concerns and data quality issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

DATA MINING (UNIT-I)

1. What is Data Mining?


Ans.Data mining is the process of extracting knowledge or insights from large
amounts of data using various statistical and computational techniques. The data
can be structured, semi-structured or unstructured, and can be stored in various
forms such as databases, data warehouses, and data lakes.
The primary goal of data mining is to discover hidden patterns and relationships in
the data that can be used to make informed decisions or predictions.

2. What Is Motivated Data Mining? Why Is It Important?


Ans. Data mining has attracted a great deal of attention in the information industry
and in society as a whole in recent years, due to the wide availability of huge
amounts of data and imminent need for turning such data into information and
knowledge. The information and. knowledge gained, can be used for applications
ranging, from market analysis, fraud detection, and customer retention, to production
control and science exploration.

3. What are the functionalities in data mining?


Ans. There are various data mining functionalities which are as follows −
 Data characterization − It is a summarization of the general
characteristics of an object class of data. The data corresponding to the
user-specified class is generally collected by a database query. The
output of data characterization can be presented in multiple forms.
 Data discrimination − It is a comparison of the general characteristics
of target class data objects with the general characteristics of objects
from one or a set of contrasting classes. The target and contrasting
classes can be represented by the user, and the equivalent data
objects fetched through database queries.
 Association Analysis − It analyses the set of items that generally
occur together in a transactional dataset. There are two parameters
that are used for determining the association rules −
o It provides which identifies the common item set in the
database.
o Confidence is the conditional probability that an item
occurs in a transaction when another item occurs.
 Classification − Classification is the procedure of discovering a model
that represents and distinguishes data classes or concepts, for the
objective of being able to use the model to predict the class of objects
whose class label is anonymous.
 Evolution analysis − It defines the trends for objects whose behaviour
changes over some time.
4.What are the classification of data mining?
Ans. Data mining can be classified into the following systems:

Classification Based on the mined Databases: A data mining system can


be classified based on the types of databases that have been mined. A
database system can be further segmented based on distinct principles,
such as data models, types of data, etc., which further assist in classifying
a data mining system.

Classification Based on the type of Knowledge Mined: A data mining


system categorized based on the kind of knowledge mind may have the
following functionalities:

1. Characterization
2. Discrimination
3. Association and Correlation Analysis
4. Classification
5. Prediction
6. Outlier Analysis
7. Evolution Analysis

Classification Based on the Techniques Utilized: A data mining system


can also be classified based on the type of techniques that are being
incorporated. These techniques can be assessed based on the
involvement of user interaction involved or the methods of analysis
employed.

Classification Based on the Applications Adapted: Data mining systems


classified based on adapted applications adapted are as follows:

1. Finance
2. Telecommunications
3. DNA
4. Stock Markets
5. E-mail

5.What are the integration of a data mining system with a database?


Ans. The list of Integration Schemes is as follows −
 No Coupling − In this scheme, the data mining system does not utilize
any of the database or data warehouse functions. It fetches the data
from a particular source and processes that data using some data
mining algorithms. The data mining result is stored in another file.
 Loose Coupling − In this scheme, the data mining system may use
some of the functions of database and data warehouse system. It
fetches the data from the data respiratory managed by these systems
and performs data mining on that data. It then stores the mining result
either in a file or in a designated place in a database or in a data
warehouse.
 Semi−tight Coupling − In this scheme, the data mining system is
linked with a database or a data warehouse system and in addition to
that, efficient implementations of a few data mining primitives can be
provided in the database.
 Tight coupling − In this coupling scheme, the data mining system is
smoothly integrated into the database or data warehouse system. The
data mining subsystem is treated as one functional component of an
information system.

6.What are the issues in data mining?


Ans. The major issues regarding data mining are-
i)Mining Methodology and User Interaction- It refers to the following kinds of
issues
 Mining different kinds of knowledge in databases − Different users may be
interested in different kinds of knowledge. Therefore it is necessary for data
mining to cover a broad range of knowledge discovery task.
 Interactive mining of knowledge at multiple levels of abstraction − The
data mining process needs to be interactive because it allows users to focus
the search for patterns, providing and refining data mining requests based on
the returned results.
 Incorporation of background knowledge − To guide discovery process and
to express the discovered patterns, the background knowledge can be used.
 Pattern evaluation − The patterns discovered should be interesting because
either they represent common knowledge or lack novelty.

ii)Performance Issues- There can be performance-related issues such as follows −


 Efficiency and scalability of data mining algorithms − In order to
effectively extract the information from huge amount of data in
databases, data mining algorithm must be efficient and scalable.
 Parallel, distributed, and incremental mining algorithms − The
factors such as huge size of databases, wide distribution of data, and
complexity of data mining methods motivate the development of parallel
and distributed data mining algorithms.

iii)Diverse Data Types Issues-


 Handling of relational and complex types of data − The database may
contain complex data objects, multimedia data objects, spatial data, temporal
data etc. It is not possible for one system to mine all these kind of data.
 Mining information from heterogeneous databases and global
information systems − The data is available at different data sources on
LAN or WAN. These data source may be structured, semi structured or
unstructured.

6.Explain KDD process.


Ans. KDD (Knowledge Discovery in Databases) is a process that involves
the extraction of useful, previously unknown, and potentially valuable
information from large datasets. The KDD process in data mining typically
involves the following steps:
 Selection: Select a relevant subset of the data for analysis.
 Pre-processing: Clean and transform the data to make it ready for
analysis. This may include tasks such as data normalization, missing
value handling, and data integration.
 Transformation: Transform the data into a format suitable for data
mining, such as a matrix or a graph.
 Data Mining: Apply data mining techniques and algorithms to the data
to extract useful information and insights.
 Interpretation: Interpret the results and extract knowledge from the
data. This may include tasks such as visualizing the results, evaluating
the quality of the discovered patterns and identifying relationships and
associations among the data.
 Evaluation: Evaluate the results to ensure that the extracted
knowledge is useful, accurate, and meaningful.
 Deployment: Use the discovered knowledge to solve the business
problem and make decisions.

7.State the advantages and disadvantages of KDD process.

Ans. Advantages of KDD:

1. Improves decision-making: KDD provides valuable insights and


knowledge that can help organizations make better decisions.
2. Increased efficiency: KDD automates repetitive and time-
consuming tasks and makes the data ready for analysis, which
saves time and money.
3. Better customer service: KDD helps organizations gain a better
understanding of their customers’ needs and preferences, which
can help them provide better customer service.
4. Fraud detection: KDD can be used to detect fraudulent activities
by identifying patterns and anomalies in the data that may
indicate fraud.
Disadvantages of KDD:

1. Privacy concerns: KDD can raise privacy concerns as it involves


collecting and analyzing large amounts of data, which can
include sensitive information about individuals.
2. Complexity: KDD can be a complex process that requires
specialized skills and knowledge to implement and interpret the
results.
3. Data Quality: KDD process heavily depends on the quality of data,
if data is not accurate or consistent, the results can be misleading
4. High cost: KDD can be an expensive process, requiring significant
investments in hardware, software, and personnel.

8.State the differences between KDD and data mining.

Ans. Difference Between KDD and Data Mining

Paramete
KDD Data Mining
r

KDD refers to a process of Data Mining refers to a


identifying valid, novel, potentially process of extracting useful
Definition useful, and ultimately and valuable information or
understandable patterns and patterns from large data
relationships in data. sets.

To find useful knowledge from To extract useful information


Objective
data. from data.

Data cleaning, data integration,


Association rules,
data selection, data
classification, clustering,
Technique transformation, data mining,
regression, decision trees,
s Used pattern evaluation, and
neural networks, and
knowledge representation and
dimensionality reduction.
visualization.

Structured information, such as Patterns, associations, or


rules and models, that can be insights that can be used to
Output
used to make decisions or improve decision-making or
predictions. understanding.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy