Introduction To Data Warehousing: Robert Andrews DB2 For I Center of Excellence
Introduction To Data Warehousing: Robert Andrews DB2 For I Center of Excellence
The Agenda
Background
Turning DATA into INFORMATION
Client Independence
Web Based
Reduced Software Maintenance
Application Integration
Reporting as a function of Line of Business apps Portal interfaces
Query/ Reporting
OnLine Analytics
Data Mining
Dashboards/ Scorecards
DBMS
Source: The Data Warehousing Institute, Smart Companies in the 21st Century, July 2003 OS/EAI-Operation Systems/Enterprise Application Integrations 4 Introduction to Data Warehousing 2009 IBM Corporation
Normalized OLTP Data Base Customer info ----> C file Order header file-> O file Order details ------> D file Item descriptions-> I file Salesman info ----> S file Very good design
change information only in one place
DB2
D C O I S
Follow a transaction
DB2
D C O I S
DB2
D C O I S
Another Question
DB2
D C O I S Who are my best Salesmen? Who are they selling to? What are they selling?
2009 IBM Corporation
Rekeyed
1+1=2
Rekeyed
ERP System
1+1=2
POS
Access
Excel
2 + 1 = 1.5
Spreadsheets
1+1=3
Other Sources
9 Introduction to Data Warehousing
1+3=7
2009 IBM Corporation
The most widespread technical problem reported by practitioners was slow query performance.
Survey of over 2000 companies that have implemented Business Intelligence Applications
The BI Survey 8 Nigel Pendse,
10
Reactive
Leverage System i Navigator On Demand Performance Tools
Leverage Query Governor (QQRYTIMLMT) with time or disk space (V5R4) governing
11
Production
Leverage H/A software to create Operational Data Store (ODS) in near real time Utilize ODS as the source for ETL processes into the Data Warehouse Combine with target side remote journaling for ETL efficiencies No impact to Production Databases Utilize mostly idle capacity of H/A Server for Data Warehouse Workloads Optionally mirror Data Warehouse
12
13
14
Address Information
416 Columbus Ave #2, Boston, Massachusetts 02116 4 New York Plaza Floor 23, Manhattan NY, 10036 125-A Washington, Los Angeles, CA 90066
Four sixteen Columbus Avenue APT2, Boston, Mass 02116 Four NY Plaza, FL-23, New York New York, 10036 125 Washington Unit A, LA, California, 90066
416 Columbus Suite #2, Suffolk County 02116 4 NY Plaza, LVL23, NYC 10036 One-twenty-five Washington #A, Los Angeles Cnty 90066
ODS
Data Warehouse
Data Mart
Mfg
Data Mart
Finance
Data Mart
Sales
OLAP Applications
16
Tuning Aspects
Separate server/partition allows for different tuning knobs to be turned May be a different allocation of resources to manage this very different workload
Separation of Powers
Data Warehouse Team versus Operational Systems Team Separate Decisions
OS or resource upgrades
E.T.L.
Extract data from somewhere
(may be MANY sources)
Transform it somehow
(may be simple or extensive)
18
CUSTNUMBER 1 2 3 4 5 6 7 8
Customer File - Data Warehouse CUSTNAME REGION OLDNUM John Smith US 1001 Mary Jones US 1002 Chris Anderson US 1003 David Perry US 1004 Harry Potter CANADA 1001 Jeremy Carr CANADA 1002 Penny Hayes CANADA 1003 Debbie Thornton CANADA 1004
PK
Secondary Index
19
FACT
Store_Dim keylist
Storekey Itemkey Storekey Datekey
Show me the date, weather, and quantity/revenue from sales of umbrellas, raingear, and hats in our Florida stores in November, and order by store, item, date, then weather
Select store, item, date, weather, sum(sales), sum(quantity) from item_dim, store_dim, date_dim, fact_table where itemkey in (...keylist...) and storekey in (...keylist...) and datekey in (...keylist...) and itemkey=itemkey, storekey=storekey, datekey=datekey group by store, item, date, weather
Date_Dim keylist
Datekey
Sales Quantity
20
E.T.L.
But.. There are two VITAL additional requirements
Validate Manage bad data in is bad data out what do you do with bad data ? how do you administer ETL jobs?
Validate Transform
Manage
21 Introduction to Data Warehousing 2009 IBM Corporation
ETL Alternatives
Do it yourself
You almost always end up looking at tools later If you do, consider use of SQL!
22
DW Staging Area
DB2 for i .25 CPUs
Shipped Logs Staged Data Or ODS
DB2 DW
DB2 for i 3.75 CPUs
ETL Tool
24
25
BI Tool Application
BI Tool
Data Load
MOLAP Many Cubing Engine Depends complex loading in engine ESSBASE, InfoManager of thought Summary with drill through to detail
79% Improvement
Database Parallelism* Real time statistics Materialized Query Tables Star Join Query Rewrite Encoded Vector Indexing Remote Journaling (Trickle Feed) Single Level Storage Autonomic Indexes Index Advisor
57% Improvement
POWER5+
POWER6 (V6R1)
POWER6 (v5r4)
27
Vector
1 13 12 28 2 17 38 2 26 33
Row 1 Row 2
....
28
Query/400 (5722-QU1)
Web Enable Query/400 Reports
Additional Features
Run Time User Enablement Active Reports (Disconnected Analysis) On Line Analytical Processing
Requires Meta Data provided with Developer Workbench
Developer Workbench
IT Tool for meta data
29 Introduction to Data Warehousing
http://www.ibm.com/systems/i/db2/webquery
2009 IBM Corporation
Intelligent bursting
Ex: Regional Sales Report
Delivery Destinations
E-mail Printer Save the reports for later viewing
Notify Function
Send notification when report is complete or fails
31
32
Trademarks
The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.
Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market. Those trademarks followed by are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.
* All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
33