0% found this document useful (0 votes)
22 views43 pages

Lecture 1

The document provides an introduction to data warehousing. It discusses that operational databases are not suitable for analysis due to lack of historical data and poor performance for complex queries. It also discusses the needs for strategic information and characteristics of such information. The key differences between operational and decision support systems are described. The document defines a data warehouse as a subject-oriented, integrated collection of historical and summarized data used for analysis and decision making.

Uploaded by

f200190
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views43 pages

Lecture 1

The document provides an introduction to data warehousing. It discusses that operational databases are not suitable for analysis due to lack of historical data and poor performance for complex queries. It also discusses the needs for strategic information and characteristics of such information. The key differences between operational and decision support systems are described. The document defines a data warehouse as a subject-oriented, integrated collection of historical and summarized data used for analysis and decision making.

Uploaded by

f200190
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 43

DataWare House

Introduction
Introduction

 Lecturer: Saba Ghani


 Since July, 2019

 Department of Computer Science


 MSCS-FAST NUCES
 BSCS-LCWU Lahore

 Email: saba.ghani@nu.edu.pk
Class honor code
Google classroom code

otnelmt
PREREQUISITES

 DATA BASE

 This course requires that you should be proficient with


Data Bases concepts

5
Books

 Reference books
 W. H. Inmon, Building the Data Warehouse
(Third Edition), John Wiley & Sons Inc., NY.

 R. Kimball, The Data Warehouse Toolkit


(Second Edition), John Wiley & Sons Inc., NY.

 Paulraj Ponniah, Data Warehousing Fundamentals,


John Wiley & Sons Inc., NY.

6
Objective
 Understand the desperate need for strategic
information
 Recognize the information crisis at every enterprise
 Distinguish between operational and informational
systems
 “Drowning in data and starving for
information”

 Knowledge is power, Intelligence is


absolute power!

8
.

9
NEEDS
 Operational databases (online transaction
processing systems or OLTP), are not
suitable for data analysis
 Contain current and detailed data
 Do not include historical data
 Perform poorly for complex queries due to
normalization
Problems with the
Naturally Evolving
Architecture
 The naturally evolving architecture
presents many challenges, such as:
• Data credibility
• Productivity
• Inability to transform data into information
Data credibility
1. No time basis of data
2. The algorithmic
differential of data
3. The levels of extraction
4. The problem of external data
5. No common source of data from the
beginning
Productivity
 To produce a corporate report, many files
and layouts of data must be analyzed
 Report-generation program should be
simple to write, retrieving the data for the
report is tedious
cross every technology that the company uses
Data to Information
 Data Integration
 Historical data
 The systems found in the naturally evolving architecture
are simply inadequate for supporting information needs.
NEEDS
 Organization stored Data in database
 Never use of this data for business
improvement
 Manager need information to
Formulate the business strategies
Establish goals
Set objectives
Monitor results
Historical overview: Crisis of
Credibility

 Businesses demand Intelligence (BI).


 Complex questions from integrated
data.
 “Intelligent Enterprise

16
Historical overview: Crisis of
Credibility
List of all items that were sold last
month?

List of all items purchased by David?

The total sales of the last month of


each branch?

How many sales transactions


occurred during the month of
January?

17
Intelligent Enterprise
1. Which items sell together? Which items to
stock?
2. Retain the present customer base

3. Improve product quality levels in top five


product groups.
4. Gain market share by 10% in the next 3 years
5. Bring three new products to market in 2 years
6. Increase sales by 15% in the North East
Division
Needs

 For business growth needs


 In Depth information knowledge of their company’s
operations
 Review and monitor key performance indicators
 Factor affect one another
 How business factors change over time
 compare their company’s performance
Strategic information
 Strategic information - Types of information needed
to make decisions in the formulation and execution of
business strategies
 Organization needs to extract these information for
business growth
 Its important for
 continued health and business survival
 Critical business decisions depend on the availability of
proper strategic information in an enterprise.
Characteristics
Characteristics of strategic information are
INTEGRATED:
Single, enterprise-wide view
DATA INTEGRITY
Information must be accurate and must conform to business rules.
ACCESSIBLE
Easily accessible, and responsive for analysis.
CREDIBLE
Business factor must have one and only one value
TIMELY
Information must be available within the stipulated time frame.
The Information Crisis
Two facts are
1.organizations have lots of data
2.Information technology resources and systems are not effective at
turning all that data into useful strategic information
Information double every 18 month
Information crisis
the available data is not readily usable for strategic decision making
An enterprise can have many types of incompatible structures and systems
All information needed to be integrate from all systems for proper decision
making
Data needed for strategic decision making must be in a format suitable for
easy analysis
OPERATIONAL VERSUS DECISION-SUPPORT
SYSTEMS

 Making the Wheels of Business Turn


• Operational systems are online transaction
processing (OLTP) systems used to run the
day-to-day business
• bread-and-butter systems.

 Watching the Wheels of Business Turn


• watch how the business runs, and make
strategic decisions to improve the business

 Different Scope, Different Purposes


• strategic information need to build informational systems that are
different from the operational systems
OPERATIONAL VERSUS DECISION-SUPPORT
SYSTEMS

OPERATIONAL INFORMATIONAL

Data Content Current values Archived, derived,


summarized
Data Structure Optimized for Optimized for complex
transactions queries

Access Frequency High Medium to low

Access Type Read, update, delete Read

Usage Predictable, repetitive Ad hoc, random,


heuristic
Response Time Sub-seconds Several seconds to min

Users Large number Relatively small number


OPERATIONAL VERSUS DECISION-SUPPORT
SYSTEMS

 A complete repository of historical corporate data extracted from


transaction systems that is available for ad-hoc access by
knowledge workers
What is a Data Warehouse?

Transaction System
 Management Information System (MIS)
 Could be typed sheets (NOT transaction system)

Ad-Hoc access
 Dose not have a certain access pattern.
 Queries not known in advance.
 Difficult to write SQL in advance.

Knowledge workers
 Typically NOT IT literate (Executives, Analysts, Managers).
 NOT clerical workers.
 Decision makers

26
What is a Data Warehouse?
An Alternative Viewpoint

“A DW is a
 subject-oriented,
 integrated,
 time-varying,
 non-volatile
collection of data that is used primarily in
organizational decision making.”

27
What is a Data Warehouse ?

 It is a blend of many technologies, the basic concept being:

 Take all data from different operational systems.


 If necessary, add relevant data from industry.
 Transform all data and bring into a uniform format.
 Integrate all data as a single entity.
 Store data in a format supporting easy access for decision support.
 Create performance enhancing indices.
 Implement performance enhancement joins.
 Run ad-hoc queries with low selectivity.

28
What is a Data Warehouse ?

29
Types of data warehouse

 Financial
 Telecommunication
 Insurance
 Human Resource
 Global
 Exploratory

30
How is it Different?

 Decision making is Ad-Hoc

31
How is it Different?

 Combines operational and historical data.

 Don’t do data entry into a DWH


 OLTP or ERP are the source systems.

 OLTP systems don’t keep history, cant get balance statement more than a year
old.

 DWH keep historical data, even of bygone customers. Why?

 In the context of bank, want to know why the customer left?

 What were the events that led to his/her leaving? Why?

 Customer retention.

32
How is it Different?

33
How much history?

 Depends on:

 Industry.

 Cost of storing historical data.

 Economic value of historical data.

 Industries and history


 Telecomm calls are much more as compared to bank transactions- 18 months.

 Retailers interested in analyzing yearly seasonal patterns- 65 weeks.

 Insurance companies want to do actuary analysis, use the historical data in order to
predict risk- 7 years.

Hence, NOT a complete repository of data

34
How much history?

Economic value of data


Vs.
Storage cost

Data Warehouse a
complete repository of data?

35
How is it Different?

 Usually (but not always) periodic or batch updates rather than real-time.

 The boundary is blurring for active data warehousing.

 For an ATM, if update not in real-time, then lot of real trouble.

 DWH is for strategic decision making based on historical


data. Wont hurt if transactions of last one hour/day are absent.

 Rate of update depends on:


 volume of data,
 nature of business,
 cost of keeping historical data,
 benefit of keeping historical data.

36
How is it Different?

 Does not follows the traditional development model

37
Data Warehouse Vs. OLTP

38
Data Warehouse Vs. OLTP

39
Comparison of Response Times

 On-line analytical processing (OLAP) queries must be executed in a


small number of seconds.
 Often requires denormalization and/or sampling.

 Complex query scripts and large list selections can generally be


executed in a small number of minutes.

 Sophisticated clustering algorithms (e.g., data mining) can


generally be executed in a small number of hours (even for
hundreds of thousands of customers).

40
Putting the pieces together

41
Why is this hard?

 Data sources are unstructured & heterogeneous.


 Requirements are always changing.
 Most computer scientist trained on OTLP systems, those
concepts not valid for DSS.
 Performance impacts are often non-linear O(n) Vs. O(nlog_n) e.g.
scanning vs. indexing.
 Complex computer/database architectures.
 Rapidly changing product characteristics.

42
High-level Implementation Steps

 Phase-I
1. Determine Users' Needs
2. Determine DBMS Server Platform
3. Determine Hardware Platform
4. Information & Data Modeling
5. Construct Metadata Repository

 Phase-II
6. Data Acquisition & Cleansing
7. Data Transform, Transport & Populate
8. Determine Middleware Connectivity
9. Prototyping, Querying & Reporting
10. Data Mining
11. On Line Analytical Processing (OLAP)

 Phase-III
12. Deployment & System Management

43

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy