0% found this document useful (0 votes)
17 views3 pages

Ramy Mahmoud 52117

The document discusses the necessity and evolution of data mining as a crucial tool for analyzing vast amounts of data in today's world. It defines data mining, outlines its processes, and describes the types of data that can be mined, emphasizing its interdisciplinary nature. Additionally, it highlights the role of database systems in managing and accessing data for mining purposes.

Uploaded by

khaledamgad002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

Ramy Mahmoud 52117

The document discusses the necessity and evolution of data mining as a crucial tool for analyzing vast amounts of data in today's world. It defines data mining, outlines its processes, and describes the types of data that can be mined, emphasizing its interdisciplinary nature. Additionally, it highlights the role of database systems in managing and accessing data for mining purposes.

Uploaded by

khaledamgad002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Khaled amgad amin 52109

Group c

Why Data Mining?


Necessity, who is the mother of invention. – Plato We live in a world where vast amounts of
data are collected daily. Analyzing such data is an important need. Section 1.1.1 looks at how
data mining can meet this need by providing tools to discover knowledge from data. In Section
1.1.2, we observe how data mining can be viewed as a result of the natural evolution of
information technology

Data Mining as the Evolution of Information Technology Data mining can be viewed as a
result of the natural evolution of information technology. The database and data management
industry evolved in the development of several critical functionalities (Figure 1.1): data
collection and database creation, data management (including data storage and retrieval and
database transaction processing), and advanced data analysis (involving data warehousing and
data mining). The early development of data collection and database creation mechanisms
served as a prerequisite for the later development of effective mechanisms for data storage and
retrieval, as well as query and transaction processing. Nowadays numerous database systems
offer query and transaction processing as common practice. Advanced data analysis has
naturally become the next step.

What Is Data Mining?


It is no surprise that data mining, as a truly interdisciplinary subject, can be defined in many
different ways. Even the term data mining does not really present all the major components in
the picture. To refer to the mining of gold from rocks or sand, we say gold mining instead of rock
or sand mining. Analogously, data mining should have been more appropriately named
“knowledge mining from data,” which is unfortunately somewhat long. However, the shorter
term, knowledge mining may not reflect the emphasis on mining from large amounts of data.
Nevertheless, mining is a vivid term characterizing the process that finds a small set of precious
nuggets from a great deal of raw material (Figure 1.3). Thus, such a misnomer carrying both
“data” and “mining” became a popular choice. In addition, many other terms have a similar
meaning to data mining—for example, knowledge mining from data, knowledge extraction,
data/pattern analysis, data archaeology, and data dredging.
Many people treat data mining as a synonym for another popularly used term, knowledge
discovery from data, or KDD, while others view data mining as merely an essential step in the
process of knowledge discovery. The knowledge discovery process is shown in Figure 1.4 as an
iterative sequence of the following steps: 1. Data cleaning (to remove noise and inconsistent
data) 2. Data integration (where multiple data sources may be combined) A popular trend in the
information industry is to perform data cleaning and data integration as a preprocessing step,
where the resulting data are stored in a data warehouse. 3. Data selection (where data relevant
to the analysis task are retrieved from the database) 4. Data transformation (where data are
transformed and consolidated into forms appropriate for mining by performing summary or
aggregation operations) Sometimes data transformation and consolidation are performed
before the data selection process, particularly in the case of data warehousing. Data reduction
may also be performed to obtain a smaller representation of the original data without sacrificing
its integrity. 5. Data mining (an essential process where intelligent methods are applied to
extract data patterns) 6. Pattern evaluation (to identify the truly interesting patterns
representing knowledge based on interestingness measures —see (Section 1.4.6) 7. Knowledge
presentation (where visualization and knowledge representation techniques are used to present
mined knowledge to users).

What Kinds of Data Can Be Mined?


As a general technology, data mining can be applied to any kind of data as long as the data are
meaningful for a target application. The most basic forms of data for mining applications are
database data (Section 1.3.1), data warehouse data (Section 1.3.2), and transactional data
(Section 1.3.3). The concepts and techniques presented in this book focus on such data. Data
mining can also be applied to other forms of data (e.g., data streams, ordered/sequence data,
graph or networked data, spatial data, text data, multimedia data, and the WWW). We present
an overview of such data in Section 1.3.4. Techniques for mining of these kinds of data are
briefly introduced in Chapter 13. In-depth treatment is considered an advanced topic. Data
mining will certainly continue to embrace new data types as they emerge

Database Data
A database system, also called a database management system (DBMS), consists of a collection
of interrelated data, known as a database, and a set of software programs to manage and access
the data. The software programs provide mechanisms for defining database structures and data
storage; for specifying and managing concurrent, shared, or distributed data access; and for
ensuring consistency and security of the information stored despite system crashes or attempts
at unauthorized access. A relational database is a collection of tables, each of which is assigned a
unique name. Each table consists of a set of attributes (columns or fields) and usually stores a
large set of tuples (records or rows). Each tuple in a relational table represents an object
identified by a unique key and described by a set of attribute values. A semantic data model,
such as an entity-relationship (ER) data model, is often constructed for relational databases. An
ER data model represents the database as a set of entities and their relationships

Relational data can be accessed by database queries written in a relational query language (e.g.,
SQL) or with the assistance of graphical user interfaces. A given query is transformed into a set
of relational operations, such as join, selection, and projection, and is then optimized for
efficient processing. A query allows retrieval of specified subsets of the data. Suppose that your
job is to analyze the AllElectronics data. Through the use of relational queries, you can ask things
like, “Show me a list of all items that were sold in the last quarter.” Relational languages also use
aggregate functions such as sum, avg (average), count, max (maximum), and min (minimum).
Using aggregates allows you to ask: “Show me the total sales of the last month, grouped by
branch,” or “How many sales transactions occurred in the month of December?” or “Which
salesperson had the highest sales

Reference:
Jiawei Han

University of Illinois at Urbana–Champaign

Micheline Kamber

Jian Pei

Simon Fraser University.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy