0% found this document useful (0 votes)
37 views32 pages

Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan

This document provides an introduction to data warehousing and related concepts. It defines a data warehouse as a centralized repository of integrated data from one or more disparate sources specifically structured for query and analysis. Key concepts discussed include OLAP (online analytical processing), dimensions, hierarchies, roll-ups, drill-downs and granularity. The document also compares OLTP to OLAP, outlines common data warehouse architectures and vendors, and provides examples of data warehouse applications.

Uploaded by

babjeereddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views32 pages

Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan

This document provides an introduction to data warehousing and related concepts. It defines a data warehouse as a centralized repository of integrated data from one or more disparate sources specifically structured for query and analysis. Key concepts discussed include OLAP (online analytical processing), dimensions, hierarchies, roll-ups, drill-downs and granularity. The document also compares OLTP to OLAP, outlines common data warehouse architectures and vendors, and provides examples of data warehouse applications.

Uploaded by

babjeereddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Introduction to Datawarehousing

Duration : 45 minutes
(approx.)
Abhishek Ranjan
The Roadmap …
 What is Data Warehouse?
 What is Data Warehousing?
 What is OLAP
 The need for Data Warehousing
 Data Warehousing - Application Areas
 OLAP vs. OLTP
 Some key Data Warehousing concepts
 Data warehouse Architecture
 Some vendors in the market
The need for Data Warehousing

Please consider the following queries:


 What were the sales volumes by region and
product category for the last year?
 How did the share price of computer
manufacturers correlate with quarterly profits over
the past 10 years?
 Which orders should we fill to maximize revenues?
The need for DW (continued…)

 End users can use a single data model and query


language
 Simpler optimized system
 Safe and reliable information storage for as long as
necessary
 Examples:
 Ability to monitor sales activity in order to make fast, informed decisions
 Faster product delivery using inventory and manufacturing elements of
the data warehouse
 Means to predict and understand business trends to make better business
decisions
Data Warehousing - Application Areas

 Sales and marketing data analysis


 Customer analysis
 Inventory and product tracking
 Risk assessment and profitability analysis
 Human resources management
 Order tracking and performance analysis
 Research laboratories
OLTP vs. DW/OLAP
OLAP vs. OLTP
Parameters OLAP OLTP
How much data is affected? Groups of Information Individual Records

What is my response time? Seconds to minutes Seconds


User interaction Throughout the entire Transaction only
database
How is data characterized? Summary data Detailed data

Data Access mode Any way you want Predefined

The database is configured Queries Transaction updates


for
The database is optimized Analysis Bulk Transactions
for
Data maintenance required Minimal updates Updated frequently
Database Normalization De normalized Highly Normalized
Let’s talk about Keywords…
 Cube & Hypercube
 Dimension
 Hierarchies in a dimension
 Roll up & Drill down
 Granularity
 Metadata
 Data Scrubbing and Cleansing
 Aggregations
Keywords(contd.)

 Operational Data Store (ODS)


 Sparsity
 Fact
 Isaacentral
 Is
legacy
centralrepository
repositoryofoftransaction
transactiondata
datafrom
from
legacysystems
systems
 Data
 Containsstaging
 Contains cleaned,
cleaned,transformed
transformedandandintegrated
integrateddata
data
 Providesdata
 Provides datafor
fordata
datawarehouses
Star Schema and queries
Snowflake Schema
warehouses

 Allowstime-critical
 Allows time-criticaldetailed
detailedqueries
 Slice
 Holdsand
 Holds current
dice
currentdata
data data
 Refreshedon
 Refreshed onaastore-and-forward
store-and-forwardbasis
basisevery
every
 MOLAP,ROLAP,HOLAP
night
night
 Servesin
 Serves inday-to-day
day-to-daydecision
decisionmaking
making
Multidimensional OLAP Architecture
Sales and Revenue Multi Dimensional OLAP
Financial Star Schema
Data Warehouse
Star Schema Data Architecture
Warehouse

Conversion to Multi-Dimensional Array

Transportation

Loading of Multi-Dimensional
Array

Business Analysis

OLAP MDDB
Servers Cubes
Relational OLAP Architecture
MOLAP & ROLAP

 MOLAP servers – data in


multidimensional array-like data
structures
 ROLAP servers - use metadata to map
star-schema relational databases into
multidimensional views
 Choice depends upon the need
Data Warehouse Architecture
Aspects of DW Architecture
 Data Consistency Architecture
 Choice of data sources,dimensions,business rules
 Reporting data store and staging data store architecture
 Choice of location meant for keeping data to report or stage
 Data modeling architecture
 Choice of data model (normalized,denormalized etc.)
 Tool architecture
 Choice of tools for reporting,analysis etc.
 Security architecture
 Deciding upon security to be provided at what levels
OLAP Vendors

 Microstrategy
 Hyperion Software
 SAS
 IBM
 Oracle
 Platinum Technology
 Brio Software
 Seagate
Some References…

 Some of the pictures/diagrams have been reproduced in this presentation from the sources mentioned below.

 [Kim] R. Kimball. (1996) The Data Warehouse


Toolkit, John Wiley, NY
 http://www.dwinfocenter.org
 www.olapcouncil.org
 www.orafaq.org
 http://192.168.121.14/kmhome/DisplayOHPa
ge.asp?/asp/advsearch/advsearchstart.asp
 http://www.olapreport.com
Thank You !
Star Schema

Store Dime nsio n Fa c t Ta ble Time Dime nsio n  A single fact table, with
STORE KEY STORE KEY
PRODUCTKEY
PERIOD KEY detail and summary data
Sto re De sc riptio n
Pe rio d De sc
City
Sta te
PERIOD KEY
Ye a r
 Fact table primary key
Dolla rs
Distric t ID
Distric t De sc .
Units
Qua rte r
Mo nth
has only one key column
per dimension
Pric e
Re g io n_ID Da y
Re g io n De sc . Curre nt Fla g
Pro duc t Dime nsio n
Re g io na l Mg r.
Le ve l PRODUCTKEY
Re so lutio n  Each key is generated
Se que nc e
Pro duc t De sc .
Bra nd
 Each dimension is a
Co lo r
Size
single table, highly
Ma nufa c ture r
Le ve l
denormalized

Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low
maintenance, very simple metadata
Drawbacks: Summary data in the fact table yields poorer performance for summary levels,
huge dimension tables a problem
Fact constellation Schema
Sto re Dime nsio n Fa c t Ta ble Tim e Dim e nsio n
STORE KEY STORE KEY
PERIOD KEY
Sto re De sc rip tio n PRODUCT KEY
City PERIOD KEY Pe rio d De sc
Sta te Ye a r
Do lla rs Qua rte r
Distric t ID
Units
Distric t De sc . Mo nth
Pric e
Re g io n_ID Da y
Re g io n De sc . Curre nt Fla g
Re g io na l Mg r.
Pro duc t Dim e nsio n
Se q ue nc e
PRODUCT KEY
Pro d uc t De sc .
Bra nd District Fact Table
Co lo r
Region Fact Table
Size District_ID
Ma nufa c ture r Region_ID
PRODUCT_KE
PRODUCT_KEY
Y
PERIOD_KEY
PERIOD_KEY
Dollars
Dollars
Units Units
Price Price
Fact constellation Schema (contd…)
Sto re Dim e nsio n Fa c t Ta b le Tim e Dime nsio n
STORE KEY
Sto re De sc rip tio n
STORE KEY
PRODUCTKEY
PERIOD KEY In the Fact Constellations,
City
Sta te
PERIOD KEY
Do lla rs
Pe rio d De sc
Ye a r aggregate tables are created
Distric t ID Qua rte r
Distric t De sc .
Re g io n_ID
Units
Pric e
Mo nth
Da y
separately from the detail,
therefore
Re g io n De sc . Curre nt Fla g
Re g io na l Mg r.
Pro d uc t Dime nsio n
Se q ue nc e
PRODUCT KEY
Pro d uc t De sc .
Bra nd Dis t ric t Fact Table it is impossible to pick up, for
Co lo r
example, Store detail when
Re g io n Fac t Table
Size Distric t_ID
Ma nufa c ture r PRODUCT_KEY Re g io n_ID
PRODUCT_KEY
PERIOD_KEY
Do lla rs
PERIOD_KEY
Do lla rs
querying
Units
Pric e
Units
Pric e the District Fact Table.

Major Advantage: No need for the “Level” indicator in the dimension tables,
since no aggregated data is stored with lower-level detail

Disadvantage: Dimension tables are still very large in some cases, which can slow
performance; front-end must be able to detect existence of aggregate facts, which
requires more extensive metadata
Snowflake Schema

Store Dimension
STORE KEY District_ID Region_ID
Store Description District Desc. Region Desc.
City Region_ID Regional Mgr.
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
Store Fact Table District Fact Table RegionFact Table
Region_ID
STORE KEY District_ID
PRODUCT_KEY
PRODUCT_KEY PERIOD_KEY
PRODUCT KEY PERIOD_KEY Dollars
PERIOD KEY Dollars Units
Units Price
Dollars Price
Units
Price
Snowflake Schema (contd…)

Store Dimension  No “level” in dimension tables


STORE KEY District_ ID Region_ ID
Store Description District Desc. Region Desc.
 Dimension tables are
City
State
Region_ ID Regional Mgr. normalized by decomposing
District ID at the attribute level
District Desc.
Region_ ID
Region Desc.
 Each dimension table has one
Regional Mgr.
Store Fact Table District Fact Table
District_ID
RegionFact Table
Region_ID
key for each level of the
STORE KEY
PRODUCT KEY
PRODUCT_KEY
PERIOD_KEY
PRODUCT_KEY
PERIOD_KEY dimensions hierarchy
Dollars
PERIOD KEY Dollars
Units
Units
Price
 The lowest level key joins the
Dollars
Units
Price
dimension table to both the
Price fact table and the lower level
attribute table

Back
Data Warehouse Definition
Bill Inmon defined Data Warehouse as :

"A (data) warehouse is a subject-oriented, integrated,


time-variant and non-volatile collection of data in
support of management's decision making process."

Ralph Kimball defined Data Warehouse in his book


"The Data Warehouse Toolkit”:

A data warehouse is "a copy of transaction data


specifically structured for query and analysis".

Next
OLAP - definition

 On-line analytical processing (OLAP) is an element of


decision support system (DSS)
 Use of computers for analyzing organization’s data

The FASMI Test


Fast Analysis of Shared Multidimensional Information

Fast
• Response time for simpler reports 1-2 sec
• Very few reports take > 20 sec
• Pre-calculations
• On-the-fly calculations
OLAP definition (continued…)

Analysis

Users provided with specific features for analyzing


data in their own way
Some of the features are:

• Time series analysis


• Ad hoc multidimensional structural changes
• Currency translation
OLAP definition (continued…)

Shared

The system implements all the security features for


confidentiality, possibly down to cell level.
Some considerations:

• Not all OLAP applications are read-only


• Security requirements vary from application to
application
OLAP definition (continued…)

Multidimensional

The system must provide :

• Multidimensional conceptual view of data

• Full support for multiple hierarchies


OLAP definition (continued…)

Information

The data and derived information needed wherever


it is and however much is relevant for the
application.

Some of the points to be considered:

• Data duplication

• Disk Space utilization

• Performance Back
Dimensions constitute a Cube

Cell

Back
Data Staging

It comprises of following sub processes:


 Extraction
 Transformation
 Loading & Indexing
 Data Quality Assurance Checking ( Data
Cleansing & Data Scrubbing)

Back
Data Warehousing

Back
Data Warehousing & OLAP
Two parts of same the process End User
External Data Analysts
Source

DW OLAP
Calculations &
Representations Dept.
Class

Operational Enterprise Class


Systems Data and Dept.
Analysis Server Class

Dept.
Class
Internet
Arrows indicate data flow direction Back
ABDOP Boundaries

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy