0% found this document useful (0 votes)
80 views33 pages

Introduction To Data Warehousing: Robert Andrews DB2 For I Center of Excellence

Introduction to Data Warehousing Robert Andrews DB2 for I Center of Excellence STG Technical Conferences 2009 Today's Reporting Requirements Remove Dependency on IT - Ease IT backlog of reporting requests - Reduce Report Maintenance - Empower End Users Client Independence - Web Based - Reduced Software Maintenance Multiple Viewing Options - Dashboards / Scorecards - Spreadsheet Integration - Board Room Quality PDF Automated Report Distribution - E-mail Distribution Application Integration
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views33 pages

Introduction To Data Warehousing: Robert Andrews DB2 For I Center of Excellence

Introduction to Data Warehousing Robert Andrews DB2 for I Center of Excellence STG Technical Conferences 2009 Today's Reporting Requirements Remove Dependency on IT - Ease IT backlog of reporting requests - Reduce Report Maintenance - Empower End Users Client Independence - Web Based - Reduced Software Maintenance Multiple Viewing Options - Dashboards / Scorecards - Spreadsheet Integration - Board Room Quality PDF Automated Report Distribution - E-mail Distribution Application Integration
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to Data Warehousing

Robert Andrews robert.andrews@us.ibm.com DB2 for i Center of Excellence

2009 IBM Corporation

STG Technical Conferences 2009

The Agenda
Background
Turning DATA into INFORMATION

Architectures/Strategies to get you there DB2 for i Enablers

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Todays Reporting Requirements


Remove Dependency on IT
Ease IT backlog of reporting requests Reduce Report Maintenance Empower End Users

Client Independence
Web Based
Reduced Software Maintenance

Multiple Viewing Options


Dashboards/Scorecards Spreadsheet Integration Board Room Quality PDF

Automated Report Distribution


E-mail Distribution

Application Integration
Reporting as a function of Line of Business apps Portal interfaces

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

What is Business Intelligence?

REPORTING WHAT HAPPENED?

ANALYSIS WHY DID IT HAPPEN?

PREDICT WHAT WILL HAPPEN?

MONITOR WHAT JUST HAPPENED?

Query/ Reporting

OnLine Analytics

Data Mining

Dashboards/ Scorecards

Historical Data (Data Warehouses/Marts)


Trending/OLAP

Real-Time Data (OS/EAI)


Business Performance Management

Data Mining (Predictive Analytics)

DBMS

Source: The Data Warehousing Institute, Smart Companies in the 21st Century, July 2003 OS/EAI-Operation Systems/Enterprise Application Integrations 4 Introduction to Data Warehousing 2009 IBM Corporation

STG Technical Conferences 2009

Normalized OLTP Data Base Customer info ----> C file Order header file-> O file Order details ------> D file Item descriptions-> I file Salesman info ----> S file Very good design
change information only in one place

DB2
D C O I S

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Follow a transaction

DB2
D C O I S

Update customer information Take an order Record a payment

OLTP usually works with small pieces of the DB

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

But Ask A Simple Question

DB2
D C O I S

Who are my best customers?


Must go through the entire customer file
2009 IBM Corporation

Introduction to Data Warehousing

STG Technical Conferences 2009

Another Question

DB2
D C O I S Who are my best Salesmen? Who are they selling to? What are they selling?
2009 IBM Corporation

Introduction to Data Warehousing

STG Technical Conferences 2009

Are you in Spreadsheet or I/T Purgatory?


Source Systems
Annual Rep Quarter 1298 this is a bogus report & is only for the purpose of creating an icon...

Rekeyed

1+1=2
Rekeyed

Reports Downloaded Excel Excel Rekeyed

ERP System

1+1=2

Excel Excel Rekeyed Downloaded

POS

Excel Cut & Paste

Access

Excel

2 + 1 = 1.5

Spreadsheets

Uploaded Excel Rekeyed Excel

1+1=3
Other Sources
9 Introduction to Data Warehousing

1+3=7
2009 IBM Corporation

STG Technical Conferences 2009

The most widespread technical problem reported by practitioners was slow query performance.
Survey of over 2000 companies that have implemented Business Intelligence Applications
The BI Survey 8 Nigel Pendse,

10

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Managing the Querying of Production Data


Shield report authors and end users from complexities of the database
Leverage a META DATA oriented Query Tool (ex: DB2 Web Query)
Define data relationships, standardize/simplify data meanings

Optimize the environment


Ensure a PROACTIVE or REACTIVE indexing strategy is in place Proactive
Read Indexing and Statistics White Paper at: http://www03.ibm.com/servers/enable/site/bi/strategy/index.html

Reactive
Leverage System i Navigator On Demand Performance Tools

Get to (at a minimum) V5R4 Minimize Impact on Production Systems


Isolate query workloads through dedicated subsystems/pools for Query jobs
Be wary of autotuner impact on queries

Leverage Query Governor (QQRYTIMLMT) with time or disk space (V5R4) governing

Get Some Assistance


IBM Lab Services SQL/Query Performance Assessment service ibm.com/systems/i/editions/services.htm

11

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Isolating Production Systems with Logical Replication H/A Solution


DB2 Mirrored Image ODS Data Warehouse

Production

H/A Backup Queries against Production Databases

I/T Optimization through Combined H/A and BI Server

Queries against Data Warehouse/Marts

Leverage H/A software to create Operational Data Store (ODS) in near real time Utilize ODS as the source for ETL processes into the Data Warehouse Combine with target side remote journaling for ETL efficiencies No impact to Production Databases Utilize mostly idle capacity of H/A Server for Data Warehouse Workloads Optionally mirror Data Warehouse

12

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Common data Challenges


Data errors
failed joins invalid dates missing values

Hidden meanings and conditional rules


2nd character of column X means .. if column Y = S, value Z must be multiplied by -1 If record type is 1, there must be a matching record in table B. If type is 2, there may be a record. If type is 3 there should not be a record. For data older than 2/11/2003, column X will be blank but it must be a valid value from then on.

13

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Common data Challenges


The same, but different
multiple instances of same table, with duplicate key values
Customer File - US CUSTNAME John Smith Mary Jones Chris Anderson David Perry
Customer File - Canada CUSTNO CUSTNAME 1001 Harry Potter 1002 Jeremy Carr 1003 Penny Hayes 1004 Debbie Thornton

CUSTNO 1001 1002 1003 1004

or different versions of same entity


incompatible data types duplicates Customer File - US
CUSTNO 1001 1002 1003 1004 CUSTNAME John Smith Mary Jones Chris Anderson David Perry
Customer File - Canada CUSTID CUSTNAM AA234 Julie Johnson AA235 Fred Hunter AB670 John Smith BD309 Alan Jordan

14

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Common data Challenges


Personal Name
Bob Christiansan Kate A. Roberts James Trenton
Source 1

Address Information
416 Columbus Ave #2, Boston, Massachusetts 02116 4 New York Plaza Floor 23, Manhattan NY, 10036 125-A Washington, Los Angeles, CA 90066

Robert Christiansen Katherine Roberts Trenton, James


Source 2

Four sixteen Columbus Avenue APT2, Boston, Mass 02116 Four NY Plaza, FL-23, New York New York, 10036 125 Washington Unit A, LA, California, 90066

R.J. Christensen Mrs. K. Roberts Mr & Mrs J.Trenton


Source 3

416 Columbus Suite #2, Suffolk County 02116 4 NY Plaza, LVL23, NYC 10036 One-twenty-five Washington #A, Los Angeles Cnty 90066

Unlimited formats, structures & attributes


15 Introduction to Data Warehousing 2009 IBM Corporation

STG Technical Conferences 2009

The Enterprise Data Warehouse Architecture


Operational System(s)
Data Propagation Extraction, Transformation and Loading
Data Staging Area

Cleansed, Transformed Data

l na tio ra ort e op upp al s t ic io n c Ta cis de

ODS

Data Warehouse

Data Mart
Mfg

Data Mart
Finance

Data Mart
Sales

OLAP Applications

PC or Browser Web Visualization Products

16

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Reasons you may choose a data warehouse


Manage larger (Terabyte?) volumes of data Add data from sources other than production systems
Ex: purchased demographic data Non IBM i databases

Clean/Transform the data


An ODS does not solve a lot of data issues

Tuning Aspects
Separate server/partition allows for different tuning knobs to be turned May be a different allocation of resources to manage this very different workload

Separation of Powers
Data Warehouse Team versus Operational Systems Team Separate Decisions
OS or resource upgrades

Single Version of the Truth


17 Introduction to Data Warehousing 2009 IBM Corporation

STG Technical Conferences 2009

E.T.L.
Extract data from somewhere
(may be MANY sources)

Transform it somehow
(may be simple or extensive)

Load it somewhere else


(and load it FAST)

18

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Transformation Example: Surrogate Keys


Customer File - US CUSTNAME John Smith Mary Jones Chris Anderson David Perry Customer File - Canada CUSTNO CUSTNAME 1001 Harry Potter 1002 Jeremy Carr 1003 Penny Hayes 1004 Debbie Thornton

CUSTNO 1001 1002 1003 1004

Surrogate key is a sequential number with no correlation to replaced value(s)

CUSTNUMBER 1 2 3 4 5 6 7 8

Customer File - Data Warehouse CUSTNAME REGION OLDNUM John Smith US 1001 Mary Jones US 1002 Chris Anderson US 1003 David Perry US 1004 Harry Potter CANADA 1001 Jeremy Carr CANADA 1002 Penny Hayes CANADA 1003 Debbie Thornton CANADA 1004

PK

Secondary Index

19

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Transformation Example: Star Schema


DIMENSIONS
Item_Dim keylist
Itemkey

FACT
Store_Dim keylist
Storekey Itemkey Storekey Datekey

Show me the date, weather, and quantity/revenue from sales of umbrellas, raingear, and hats in our Florida stores in November, and order by store, item, date, then weather
Select store, item, date, weather, sum(sales), sum(quantity) from item_dim, store_dim, date_dim, fact_table where itemkey in (...keylist...) and storekey in (...keylist...) and datekey in (...keylist...) and itemkey=itemkey, storekey=storekey, datekey=datekey group by store, item, date, weather

Date_Dim keylist
Datekey

Sales Quantity

20

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

E.T.L.
But.. There are two VITAL additional requirements
Validate Manage bad data in is bad data out what do you do with bad data ? how do you administer ETL jobs?

Validate Transform

Manage
21 Introduction to Data Warehousing 2009 IBM Corporation

STG Technical Conferences 2009

ETL Alternatives
Do it yourself
You almost always end up looking at tools later If you do, consider use of SQL!

ETL lite: IBM i based


Information Builders Data Migrator www.ibi.com Coglin Mills Rodin DB2 Web Query Edition www.coglinmill.com Talend Open Source www.talend.com

High End (AIX Partition on Power Systems)


IBM InfoSphere Information Server

22

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

DB2 for i DW Near Real Time Architecture


ERP
IBM i LPAR 4 CPUs

DW Staging Area
DB2 for i .25 CPUs
Shipped Logs Staged Data Or ODS

DB2 DW
DB2 for i 3.75 CPUs

Remote Journaling Data Mirror

ETL Tool

Remote Journaling during normal business processing hours


Trickle Feed Staging Area/ODS Eliminate EXTRACTION impact on production systems No Charge Feature of IBM i Requires Program (e.g., DataMirror) to read data from journal receivers Can add SQL logic to remove unwanted fields, change datatypes,

Virtualization Engine Technologies


Optimize resources for supporting production and daytime data warehouse queries High speed data transfers over Virtual Ethernet Common Backup and other Shared I/O
23 Introduction to Data Warehousing 2009 IBM Corporation

STG Technical Conferences 2009

On Line Analytical Processing (OLAP)


OLAP is INTERACTIVE and ITERATIVE
Query is usually batch, list oriented result sets

Accessing business data with numerous dimensions


'anything' by 'anything' by 'anything' analysis data can be easily analyzed from many different viewpoints data is modeled to the business summaries and aggregations are calculated data is viewed across, down and through the various dimensions

Helps answer business questions


How are my different departments performing? Is this pattern the same every year? Can we look at the information another way?

24

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

OLAP is uniquely suited to handle applications such as:


Budgeting Planning Forecasting Business Modeling Financial Consolidation Sales & Performance Analysis Customer & Product Profitability

25

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

BI Tool Application

BI Tool

What is the right OLAP Technology?

Data Load

SQL 3 SQL 2 SQL 1 Relational Data


ROLAP Few Query Optimization DBMS Backend complex SQL Meta Data Layer DB2 Web Query (Olap option) Will vary Summary or Detail
2009 IBM Corporation

# of users engine architecture via metadata Examples speed data strategy


26 Introduction to Data Warehousing

MOLAP Many Cubing Engine Depends complex loading in engine ESSBASE, InfoManager of thought Summary with drill through to detail

STG Technical Conferences 2009

DB2 for i Enablers for Data Warehousing


POWER6 Processors SQL Query Engine (SQE)
Self Learning, Self Adapting
140,000 120,000 100,000 80,000 60,000 40,000 20,000 0 2w i520 2w 520 4w i570 4w 570 8w i570

79% Improvement

Database Parallelism* Real time statistics Materialized Query Tables Star Join Query Rewrite Encoded Vector Indexing Remote Journaling (Trickle Feed) Single Level Storage Autonomic Indexes Index Advisor

57% Improvement

POWER5+

POWER6 (V6R1)

POWER6 (v5r4)

*See detailed certified benchmark results at http://www.sap.com/solutions/benchmark/bid_res ults.htm

27

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

BI Acceleration with Encoded Vector Indexing


Indexing technology that can significantly improve performance, especially for star schema 10% to 30% faster index builds 1/3 to 1/16 the size 1/2 the time for index scans 1/3 the time for bit map generation Symbol Table
Key Value Arizona Arkansas ...... Virginia Wyoming Code 1 2 37 38 First Row 1 5 1222 7 Last Row 80005 99760 30111 83000 Count 5000 7300 340 2760

Vector
1 13 12 28 2 17 38 2 26 33

Row 1 Row 2

....

EVIs now part of Index Advice!!!


2009 IBM Corporation

28

Introduction to Data Warehousing

STG Technical Conferences 2009

IBM DB2 Web Query for System i Powered By Information Builders


Base Program Product Includes:
IBM i Web Reporting Server Several Web Based authoring tools
RA, GA, Power Painter

Query/400 (5722-QU1)
Web Enable Query/400 Reports

BASE PRODUCT OFFERED AS NO CHARGE UPGRADE FROM QU1


Does not include Software Maintenance

Additional Features
Run Time User Enablement Active Reports (Disconnected Analysis) On Line Analytical Processing
Requires Meta Data provided with Developer Workbench

DB2 Web Query Report Broker


Automated Report Execution and Distribution

DB2 Web Query SDK


Web Services to integrate reporting functions into applications/portals

Developer Workbench
IT Tool for meta data
29 Introduction to Data Warehousing

http://www.ibm.com/systems/i/db2/webquery
2009 IBM Corporation

STG Technical Conferences 2009

DB2 Web Query Report Broker 5733-QU3


Automated Delivery Of Information
On Scheduled Basis Through Admin GUI Daily, Weekly, Specific Days, exclude rules On Event Basis Some customization required

Intelligent bursting
Ex: Regional Sales Report

Additional output formats for batch reporting


(HTML, PDF, Excel, Active HTML)

Delivery Destinations
E-mail Printer Save the reports for later viewing

Notify Function
Send notification when report is complete or fails

Requires DB2 Web Query BASE Product to be installed


30 Introduction to Data Warehousing 2009 IBM Corporation

STG Technical Conferences 2009

New in 2009: Microsoft Integration


Spreadsheet Client
Improve the experience for Excel Users Excel Plug In Embed queries in Excel templates

SQL Server Adapter


Extend the reach of DB2 Web Query Support pulling data from multiple SQL Server databases with a single adapter

31

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

32

Introduction to Data Warehousing

2009 IBM Corporation

STG Technical Conferences 2009

Trademarks
The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.
Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market. Those trademarks followed by are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.

For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:


*, AS/400, e business(logo), DBE, ESCO, eServer, FICON, IBM, IBM (logo), iSeries, MVS, OS/390, pSeries, RS/6000, S/30, VM/ESA, VSE/ESA, WebSphere, xSeries, z/OS, zSeries, z/VM, System i, System i5, System p, System p5, System x, System z, System z9, BladeCenter

The following are trademarks or registered trademarks of other companies.


Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.

* All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

33

Introduction to Data Warehousing

2009 IBM Corporation

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy