Chapter - 5 - Data Mining
Chapter - 5 - Data Mining
Data Mining
• Data Transformation
– Data Transformation is defined as the process of transforming data into appropriate form required by mining procedure. Data Transformation is a two
step process:
• Data Mapping: Assigning elements from source base to destination to capture transformations.
• Code generation: Creation of the actual transformation program.
• Data Mining
– Data mining is defined as techniques that are applied to extract patterns potentially useful. It transforms task relevant dat a into patterns, and decides
purpose of model using classification or characterization.
• Pattern Evaluation
– Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge based on given measures. It find interestingness score
of each pattern, and uses summarization and Visualization to make data understandable by user.
• Knowledge Representation
– This involves presenting the results in a way that is meaningful and can be used to make decisions.
KDD refers to a process of identifying valid, novel, potentially useful, and ultimately Data Mining refers to a process of extracting useful and
Definition
understandable patterns and relationships in data. valuable information or patterns from large data sets.
Objective To find useful knowledge from data. To extract useful information from data.
Structured information, such as rules and models, that can be used to make decisions or Patterns, associations, or insights that can be used to
Output
predictions. improve decision-making or understanding.
– ii. Mining information from heterogeneous databases and global information systems:
• Since data is fetched from different data sources on Local Area Network (LAN) and Wide Area Network (WAN).The discovery of kn owledge from
different sources of structured is a great challenge to data mining.
3. Distributed Data:
• True data is normally put away on various stages in distributed processing conditions. It very well may be on the internet,
individual systems, or even on the databases. It is essentially hard to carry all the data to a unified data archive principa lly
because of technical and organizational reasons.
5. Performance:
• The presentation of the data mining framework basically relies upon the productivity of techniques and algorithms utilized. O n
the off chance that the techniques and algorithms planned are not sufficient; at that point, it will influence the presentati on of
the data mining measure unfavorably.
9. Ethics:
• Data mining raises ethical concerns related to the collection, use, and dissemination of data. The data may be used to
discriminate against certain groups, violate privacy rights, or perpetuate existing biases. Moreover, data mining algorithms
may not be transparent, making it challenging to detect biases or discrimination.