Concepts and Techniques: Data Mining
Concepts and Techniques: Data Mining
— Chapter 3 —
Jiawei Han
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
©2006 Jiawei Han and Micheline Kamber, All rights reserved
July 26, 2024 Data Mining: Concepts and Techniques 1
July 26, 2024 Data Mining: Concepts and Techniques 2
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
records
Data cleaning and data integration techniques are
applied.
Ensure consistency in naming conventions, encoding
time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country
<dimension_name_first_time> in cube
<cube_name_first_time>
all all
Office Day
Month
July 26, 2024 Data Mining: Concepts and Techniques 32
July 26, 2024 Data Mining: Concepts and Techniques 33
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
ct
TV
du
PC U.S.A
o
Pr
VCR
Country
sum
Canada
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
DIVISION
Location Each circle is
called a footprint Promotion Organization
July 26, 2024 Data Mining: Concepts and Techniques 39
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
Monitor
Metadata & OLAP Server
Other
sources Integrator
Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining
Data Marts
materialized
July 26, 2024 Data Mining: Concepts and Techniques 44
Data Warehouse Back-End Tools and Utilities
Data extraction
get data from multiple, heterogeneous, and external
sources
Data cleaning
detect errors in the data and rectify them when possible
Data transformation
convert data from legacy or host format to warehouse
format
Load
sort, summarize, consolidate, compute views, check
warehouse
July 26, 2024 Data Mining: Concepts and Techniques 46
Metadata Repository
Meta data is the data defining warehouse objects. It stores:
Description of the structure of the data warehouse
schema, view, dimensions, hierarchies, derived data defn, data
mart locations and contents
Operational meta-data
data lineage (history of migrated data and transformation path),
currency of data (active, archived, or purged), monitoring
information (warehouse usage statistics, error reports, audit trails)
The algorithms used for summarization
The mapping from operational environment to the data warehouse
Data related to system performance
warehouse schema, view and derived data definitions
Business data
business terms and definitions, ownership of data, charging policies
July 26, 2024 Data Mining: Concepts and Techniques 47
OLAP Server Architectures
Relational OLAP (ROLAP)
Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middle ware
Include optimization of DBMS backend, implementation of
aggregation navigation logic, and additional tools and services
Greater scalability
Multidimensional OLAP (MOLAP)
Sparse array-based multidimensional storage engine
Fast indexing to pre-computed summarized data
Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer)
Flexibility, e.g., low level: relational, high-level: array
Specialized SQL servers (e.g., Redbricks)
Specialized support for SQL queries over star/snowflake schemas
July 26, 2024 Data Mining: Concepts and Techniques 48
July 26, 2024 Data Mining: Concepts and Techniques 49
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
and product
A join index on city maintains for each
data warehouses
ODBC, OLEDB, Web accessing, service facilities,
Layer2
MDDB
MDDB
Meta
Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data integration Warehouse Repository
July 26, 2024 Data Mining: Concepts and Techniques 61
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
Summary
July 26, 2024 Data Mining: Concepts and Techniques 62
Summary: Data Warehouse and OLAP Technology