An Introduction To Data Warehousing
An Introduction To Data Warehousing
Data Warehousing
Adil Siddiqui
Adil.siddiqui@tcs.com
Course Roadmap
Data Warehousing - An Overview
Characteristic of Data Warehouse
Evolution of Data Warehousing
Need for Data Warehouse
Data Warehouse and Data Mart
OLTP Vs Data Warehouse
Operational Data Store
OLTP Vs ODS Vs DWS
Data Warehouse Architecture
Objectives
At the end of this lesson, you will know :
Data from
Data from
multiple
multiple
sources is
sources is
integrated for
integrated for
a subject
a subject
Identical
Identical
queries will give
queries will give
same results at
same results at
different times.
different times.
Supports
Supports
analysis
analysis
requiring
requiring
historical data
historical data
Subject-OrientedCharacteristics of a Data
Warehouse
Data
Operation
Warehouse
al
Leads
Quotes
Prospects
Orders
Customers
Products
Regions
Time
Operational
delet
e
replace
insert
load
change
read only
access
Data
Warehouse
Snapshot data
time horizon : 5-10 years
key has an element of time
data warehouse stores
historical data
Alternate Definitions
A collection of integrated, subject
oriented databases designed to
support the DSS function, where
each unit of data is relevant to some
moment of time
- Imhoff
Alternate Definitions
Data Warehouse is a repository of data
summarized or aggregated in
simplified form from operational
systems. End user orientated data
access and reporting tools let user
get at the data for decision support Babcock
Evolution of Data
1960 - 1985 : MIS Era
Warehousing
Unfriendly
Slow
Dependent on IS programmers
Inflexible
Analysis limited to defined reports
Focus on Reporting
Evolution of Data
1985 - 1990 : Querying Era
Warehousing
Adhoc, unstructured access to corporate data
SQL as interface not scalable
Cannot handle complex analysis
Evolution of Data
1990 - 20xx : Analysis Era
Warehousing
Trend Analysis
What If ?
Moving Averages
Cross Dimensional Comparisons
Statistical profiles
Automated pattern and rule discovery
Focus on Online Analysis
analyze information
Consolidation of disparate information sources
Strategic advantage over competitors
Faster time-to-market for products and
services
Replacement of older, less-responsive decision
support systems
Reduction in demand on IS to generate reports
OLTP Vs Warehouse
Operational System
Data Warehouse
Transaction Processing
Query Processing
Time Sensitive
History Oriented
Operator View
Managerial View
Organized by transactions
(Order, Input, Inventory)
Volatile Data
Not Flexible
Flexible
Processing Power
Capacity Planning
Time of day
Processing Load Peaks During the Beginning and End of Day
Examples Of Some
Applications
Manufacturers
Manufacturers
Target
Retailers
Retailers
Marketing
Market Segmentation
Budgeting
Credit Rating Agencies
Financial Reporting and Consolidation
Churn Analysis
Profitability Management
Event tracking
Customers
Customers
Do we need a separate
database ?
OLTP and data warehousing require two very
Data Marts
Enterprise wide data warehousing projects have a
Data Marts
Subject or Application Oriented
Business View of Warehouse
Scope
Data Marts
department
Business
Process Oriented
Historical Detailed
Data
data
Perspe Some summary
ctive
Subject Multiple subject
areas
Detailed (some
history)
Summarized
Single Partial
subject
Data Marts
Data
Sources
Many
Few
Operational/ External Operational,
Implement
Time
Frame
Characteris
tics
Data
external data
stage
Multiple stage
implementation
Flexible, extensible
Durable/Strategic
Data orientation
Restrictive, non
extensible
Short life/tactical
Project
Expensive
Relatively cheap
Change management is
difficult
Technical challenges in
building large databases
Cleansing, transformation,
modeling techniques may
be incompatible
A
B
ODS
Data
Warehouse
C
Operational
DSS
operational systems.
The ODS contains current valued and
near current valued data.
The ODS contains almost exclusively
all detail data
The ODS requires a full function,
update, record oriented environment.
Different kinds of
Information Needs
Current
Current
Recent
Recent
Historical
Historical
ODS
Data
Warehouse
Analysts
Managers and
analysts
Individual
records,
transaction or
analysis driven
Set of records,
analysis driven
Current and
near-current
Historical
Data
Structure
Detailed
Detailed and
lightly
summarized
Detailed and
Summarized
Data
organization
Functional
Subject-oriented
Subject-oriented
Audience
Operating
Personnel
OLTP
ODS
Data
Warehouse
Data redundancy
Non-redundant within
system; Unmanaged
redundancy among
systems
Somewhat
redundant with
operational
databases
Managed redundancy
Data update
Field by field
Field by field
Controlled batch
Database size
Moderate
Moderate
Development
Methodology
Requirements driven,
structured
Data driven,
somewhat
evolutionary
Data driven,
evolutionary
Philosophy
Support day-to-day
operation
Select
EIS /DSS
Metadata
Query Tools
Extract
Transform
Integrate
Maintain
Data
Warehouse
OLAP/ROLAP
Web Browsers
Operational
Systems/Data
Data
Preparation
Multi-tiered Data Warehouse without ODS
Middleware/
API
Data Mining
Metadata
Select
Select
Extract
Extract
Transform
Integrate
ODS
Transform
Load
Maintain
Operational
Systems/Data
Data
Preparation
Multi-tiered Data Warehouse with ODS
Data
Preparation
Data
Warehouse
Reference Book
Principles of Data Warehouse Design
Data warehouses and
OLAP:Concepts, Architectures and
Solutions
Thank You