Lecture 1
Lecture 1
Introduction
Introduction
Email: saba.ghani@nu.edu.pk
Class honor code
Google classroom code
otnelmt
PREREQUISITES
DATA BASE
5
Books
Reference books
W. H. Inmon, Building the Data Warehouse
(Third Edition), John Wiley & Sons Inc., NY.
6
Objective
Understand the desperate need for strategic
information
Recognize the information crisis at every enterprise
Distinguish between operational and informational
systems
“Drowning in data and starving for
information”
8
.
9
NEEDS
Operational databases (online transaction
processing systems or OLTP), are not
suitable for data analysis
Contain current and detailed data
Do not include historical data
Perform poorly for complex queries due to
normalization
Problems with the
Naturally Evolving
Architecture
The naturally evolving architecture
presents many challenges, such as:
• Data credibility
• Productivity
• Inability to transform data into information
Data credibility
1. No time basis of data
2. The algorithmic
differential of data
3. The levels of extraction
4. The problem of external data
5. No common source of data from the
beginning
Productivity
To produce a corporate report, many files
and layouts of data must be analyzed
Report-generation program should be
simple to write, retrieving the data for the
report is tedious
cross every technology that the company uses
Data to Information
Data Integration
Historical data
The systems found in the naturally evolving architecture
are simply inadequate for supporting information needs.
NEEDS
Organization stored Data in database
Never use of this data for business
improvement
Manager need information to
Formulate the business strategies
Establish goals
Set objectives
Monitor results
Historical overview: Crisis of
Credibility
16
Historical overview: Crisis of
Credibility
List of all items that were sold last
month?
17
Intelligent Enterprise
1. Which items sell together? Which items to
stock?
2. Retain the present customer base
OPERATIONAL INFORMATIONAL
Transaction System
Management Information System (MIS)
Could be typed sheets (NOT transaction system)
Ad-Hoc access
Dose not have a certain access pattern.
Queries not known in advance.
Difficult to write SQL in advance.
Knowledge workers
Typically NOT IT literate (Executives, Analysts, Managers).
NOT clerical workers.
Decision makers
26
What is a Data Warehouse?
An Alternative Viewpoint
“A DW is a
subject-oriented,
integrated,
time-varying,
non-volatile
collection of data that is used primarily in
organizational decision making.”
27
What is a Data Warehouse ?
28
What is a Data Warehouse ?
29
Types of data warehouse
Financial
Telecommunication
Insurance
Human Resource
Global
Exploratory
30
How is it Different?
31
How is it Different?
OLTP systems don’t keep history, cant get balance statement more than a year
old.
Customer retention.
32
How is it Different?
33
How much history?
Depends on:
Industry.
Insurance companies want to do actuary analysis, use the historical data in order to
predict risk- 7 years.
34
How much history?
Data Warehouse a
complete repository of data?
35
How is it Different?
Usually (but not always) periodic or batch updates rather than real-time.
36
How is it Different?
37
Data Warehouse Vs. OLTP
38
Data Warehouse Vs. OLTP
39
Comparison of Response Times
40
Putting the pieces together
41
Why is this hard?
42
High-level Implementation Steps
Phase-I
1. Determine Users' Needs
2. Determine DBMS Server Platform
3. Determine Hardware Platform
4. Information & Data Modeling
5. Construct Metadata Repository
Phase-II
6. Data Acquisition & Cleansing
7. Data Transform, Transport & Populate
8. Determine Middleware Connectivity
9. Prototyping, Querying & Reporting
10. Data Mining
11. On Line Analytical Processing (OLAP)
Phase-III
12. Deployment & System Management
43