Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
Duration : 45 minutes
(approx.)
Abhishek Ranjan
The Roadmap …
What is Data Warehouse?
What is Data Warehousing?
What is OLAP
The need for Data Warehousing
Data Warehousing - Application Areas
OLAP vs. OLTP
Some key Data Warehousing concepts
Data warehouse Architecture
Some vendors in the market
The need for Data Warehousing
Transportation
Loading of Multi-Dimensional
Array
Business Analysis
OLAP MDDB
Servers Cubes
Relational OLAP Architecture
MOLAP & ROLAP
Microstrategy
Hyperion Software
SAS
IBM
Oracle
Platinum Technology
Brio Software
Seagate
Some References…
Some of the pictures/diagrams have been reproduced in this presentation from the sources mentioned below.
Store Dime nsio n Fa c t Ta ble Time Dime nsio n A single fact table, with
STORE KEY STORE KEY
PRODUCTKEY
PERIOD KEY detail and summary data
Sto re De sc riptio n
Pe rio d De sc
City
Sta te
PERIOD KEY
Ye a r
Fact table primary key
Dolla rs
Distric t ID
Distric t De sc .
Units
Qua rte r
Mo nth
has only one key column
per dimension
Pric e
Re g io n_ID Da y
Re g io n De sc . Curre nt Fla g
Pro duc t Dime nsio n
Re g io na l Mg r.
Le ve l PRODUCTKEY
Re so lutio n Each key is generated
Se que nc e
Pro duc t De sc .
Bra nd
Each dimension is a
Co lo r
Size
single table, highly
Ma nufa c ture r
Le ve l
denormalized
Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low
maintenance, very simple metadata
Drawbacks: Summary data in the fact table yields poorer performance for summary levels,
huge dimension tables a problem
Fact constellation Schema
Sto re Dime nsio n Fa c t Ta ble Tim e Dim e nsio n
STORE KEY STORE KEY
PERIOD KEY
Sto re De sc rip tio n PRODUCT KEY
City PERIOD KEY Pe rio d De sc
Sta te Ye a r
Do lla rs Qua rte r
Distric t ID
Units
Distric t De sc . Mo nth
Pric e
Re g io n_ID Da y
Re g io n De sc . Curre nt Fla g
Re g io na l Mg r.
Pro duc t Dim e nsio n
Se q ue nc e
PRODUCT KEY
Pro d uc t De sc .
Bra nd District Fact Table
Co lo r
Region Fact Table
Size District_ID
Ma nufa c ture r Region_ID
PRODUCT_KE
PRODUCT_KEY
Y
PERIOD_KEY
PERIOD_KEY
Dollars
Dollars
Units Units
Price Price
Fact constellation Schema (contd…)
Sto re Dim e nsio n Fa c t Ta b le Tim e Dime nsio n
STORE KEY
Sto re De sc rip tio n
STORE KEY
PRODUCTKEY
PERIOD KEY In the Fact Constellations,
City
Sta te
PERIOD KEY
Do lla rs
Pe rio d De sc
Ye a r aggregate tables are created
Distric t ID Qua rte r
Distric t De sc .
Re g io n_ID
Units
Pric e
Mo nth
Da y
separately from the detail,
therefore
Re g io n De sc . Curre nt Fla g
Re g io na l Mg r.
Pro d uc t Dime nsio n
Se q ue nc e
PRODUCT KEY
Pro d uc t De sc .
Bra nd Dis t ric t Fact Table it is impossible to pick up, for
Co lo r
example, Store detail when
Re g io n Fac t Table
Size Distric t_ID
Ma nufa c ture r PRODUCT_KEY Re g io n_ID
PRODUCT_KEY
PERIOD_KEY
Do lla rs
PERIOD_KEY
Do lla rs
querying
Units
Pric e
Units
Pric e the District Fact Table.
Major Advantage: No need for the “Level” indicator in the dimension tables,
since no aggregated data is stored with lower-level detail
Disadvantage: Dimension tables are still very large in some cases, which can slow
performance; front-end must be able to detect existence of aggregate facts, which
requires more extensive metadata
Snowflake Schema
Store Dimension
STORE KEY District_ID Region_ID
Store Description District Desc. Region Desc.
City Region_ID Regional Mgr.
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
Store Fact Table District Fact Table RegionFact Table
Region_ID
STORE KEY District_ID
PRODUCT_KEY
PRODUCT_KEY PERIOD_KEY
PRODUCT KEY PERIOD_KEY Dollars
PERIOD KEY Dollars Units
Units Price
Dollars Price
Units
Price
Snowflake Schema (contd…)
Back
Data Warehouse Definition
Bill Inmon defined Data Warehouse as :
Next
OLAP - definition
Fast
• Response time for simpler reports 1-2 sec
• Very few reports take > 20 sec
• Pre-calculations
• On-the-fly calculations
OLAP definition (continued…)
Analysis
Shared
Multidimensional
Information
• Data duplication
• Performance Back
Dimensions constitute a Cube
Cell
Back
Data Staging
Back
Data Warehousing
Back
Data Warehousing & OLAP
Two parts of same the process End User
External Data Analysts
Source
DW OLAP
Calculations &
Representations Dept.
Class
Dept.
Class
Internet
Arrows indicate data flow direction Back
ABDOP Boundaries