Best Chapter 1 DM
Best Chapter 1 DM
Outline
Brief description of data mining
Data warehousing, data mining and database
technology
Online Transaction processing and Online
Analytical Processing
Database?
Collection of records.
Example: data collected, maintained and used in
airline reservation.
Personal address book in word document.
Database is a model of structure of reality.
Database supports queries and updates modeling
processes of reality i.e., The use of database reflects
the processes of reality.
For example: A bank database.(Customer transaction
either credit or debit are updated in the database).
Data Warehouse?
Refers to a database that is maintained separately
from an operational database.
That is a dedicated database system and mainly
used for decision making.
It covers much longer time horizon than
transactional system.
Collects data from multiple databases that have
been processed uniformly (clean data).
Eg: university warehouse(contains student database,
staff database)
Cont’d …
,Data Mining?
Data mining refers to extracting or mining
knowledge from large amounts of data (data
warehouse).
Mining of gold from rocks or sand
Eg: From University warehouse we can extract
information about staff salary over the
particular period of time and also we can
predict how much will be for next year.
Why Data Mining?
6
Difference between operational DB and
Data warehouses
Operational DB Datawarehouse
The major task is to perform online The major task is to perform data analysis
transaction and query processing. These and decision making. These systems are
systems are called Online Transaction known as Online analytical
Processing. Processing(OLAP)
Customer (user)oriented and is used for Market (system) oriented and is used for
query processing and transaction by data analysis by knowledge workers
clients and IT Professionals. including managers, executives and
analysts.
OLTP manages current data It manages large amount of historical data
An OLTP system adopts an ER model and An OLAP adopts either a star or snow
application oriented DB design. flake model and subject oriented
database design.
Architecture: Typical Data Mining System
Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data Warehouse
Server
Pattern Evaluation
– Data mining: the core of
knowledge discovery process.
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases
Steps of a KDD Process
1. Data cleaning(to remove noise –fill the missing data)
2. Data integration(combine multiple data sources).
3. Data selection(where the relevant data to the analysis task are
retrieved from the database).
4. Data transformation(transformed to appropriate form for mining)
5. Data mining(is the process where intelligent techniques are
applied in order to extract interesting pattern).
6. Pattern evaluation(to identify ,it is truly interesting pattern with
the help of measures.)
7. Knowledge presentation(Visualization and knowledge
representation that are useful to present mining knowledge to
the user).
Data Mining Taxonomy
DATAMINING
Database
Technology Statistics
Machine Visualization
Data Mining
Learning
Pattern
Recognition Other
Algorithm Disciplines
Steps in Data Mining Process
problem definition
data collection and enhancement
modeling strategies
training, validation, and testing of models
analyzing results
modeling iterations
implementing results.
Data Collection and Enhancement
Define Data Sources
Join and De-normalize Data
Enrich data(add some data)
Transform data(some aggregation etc).
Modeling strategies
Data mining strategies fall into two broad
categories: supervised learning and
unsupervised learning.
Training, Validation, and Testing of Models