CHP 02 Data Warehouse Architecture
CHP 02 Data Warehouse Architecture
Dr.Sunil Khilari
Dr.Sunil Khilari
Chap-02 -Data Warehousing & Online
Analytical Processing
1. Introduction to data warehousing, Need of warehouse(DW),
Operational database versus DW.
2. Data warehouse life cycle, building a Data Warehouse, Data
Warehousing Components, Data Warehousing Architecture,
DW Models .
3. Extraction, Transformation & Loading, Metadata Repository,
feature selection & creation
4. Multi-Dimensional data Modeling: Star schema, snowflake
schema & fact constellation schema, On Line Analytical
Processing Categorization of OLAP Tools, Data cubes &
Operations on cubes.
5. Design and usage of Data Warehouse (at least one system
diagram) Dr.Sunil Khilari
2.1 Introduction of data warehousing
The concept of data warehousing dates back to the late 1980s when
IBM researchers Barry Devlin and Paul Murphy developed the
"business data warehouse". In essence, the data warehousing.
Dr.Sunil Khilari
Definition of Data warehouse
Data warehouse is a central repository for all or significant parts of the data
that an enterprise's various business systems collect. It usually contains
historical data derived from transaction data, it include data from other
sources.
Dr.Sunil Khilari
Definition Data warehouse (cont.)
1. Subject-Oriented:
This ability to define a data warehouse by subject matter-sales,in this case,
makes the data warehouse subject oriented.
e.g. Bank would be organized by customer, deposit, interest rate, Loan.
Purchase, sales inventory
2. Integrated:
While constructing the data warehouse, multiple, heterogeneous sources such
as relational database flat files and OLTP files are utilized and data collected
form them is integrated.
4. Time Variant:
All data in the data warehouse is identified with a particular time period.
Data in the data-warehouse is collected from the corporate data archives and
could be 3 to 10 years old or even older. the data provide historical perspective
and are used for comparison, trends and forecasting
Dr.Sunil Khilari
Data Warehousing Architecture
Data warehouses and their architectures vary depending upon the specifics of
an organization's situation. Three common architectures are:
Dr.Sunil Khilari
Data Warehouse Architecture (Basic)
Dr.Sunil Khilari
Data Warehouse Architecture (with a Staging Area)
Dr.Sunil Khilari
Data Warehouse Architecture (with a Staging Area and Data Marts)
Outflow
meta flow
Up flow
Inflow
Down flow
Archive/ Backup
Staging Area : Need to clean and process your operational data before
putting it into the warehouse. A staging area simplifies building
summaries and general warehouse management
Dr.Sunil Khilari
Data Warehouse Architecture(cont..)
Dr.Sunil Khilari
Information flows of a data warehouse
Up flow:- the processes associate with adding value to the data in the
warehouse through summarizing, packing and distributing of the data
Dr.Sunil Khilari
Data warehouse
Dr.Sunil Khilari
Data Warehouse ETL
Data Warehouse
ETL pipeline
outputs
RDBMS1 RDBMS2
HTML XML1
1
Dr.Sunil Khilari
Data warehousing Design
Dr.Sunil Khilari
Data warehousing Design(Cont..)
Dimensionality modeling:
”A logical design technique that aims to present the data in a
standard, sensitive from that allows for high-performance access”
“The fact table contains business facts or measures and foreign keys
which refer to candidate keys (normally primary keys) in the dimension
tables.”
Dr.Sunil Khilari
Fact Tables
In data warehousing, a fact table consists of the
measurements, metrics or facts of a business process. It is
located at the center of a star schema or a snowflake schema
surrounded by dimension tables. Where multiple fact tables
are used, these are arranged as a fact constellation schema. A
fact table typically has two types of columns: those that
contain facts and those that are foreign keys to dimension
tables. The primary key of a fact table is usually a composite
key that is made up of all of its foreign keys. Fact tables
contain the content of the data warehouse and store different
types of measures like additive, non additive, and semi
additive measures.
Dr.Sunil Khilari
Fact Table (Cont..)
Fact tables provide the (usually) additive values that act as independent
variables by which dimensional attributes are analyzed. Fact tables are often
defined by their grain. The grain of a fact table represents the most atomic
level by which the facts may be defined. The grain of a SALES fact table
might be stated as "Sales volume by Day by Product by Store". Each record
in this fact table is therefore uniquely defined by a day, product and store.
Other dimensions might be members of this fact table (such as
location/region) but these add nothing to the uniqueness of the fact records.
These "affiliate dimensions" allow for additional slices of the independent
facts but generally provide insights at a higher level of aggregation
Dr.Sunil Khilari
Dimensional data modeling- of Data Warehouses
Star schema
Snowflake schema
Fact constellations
Dr.Sunil Khilari
Dimensional data modeling- of Data
Warehouses
Star schema: “A fact table in the middle connected
to a set of dimension tables”
a logical structure that has a fact table containing factual data in the
center, surrounded by dimension table containing reference data.
key of the fact table is made up of two or more foreign key. This
characteristic ‘star-like’ structure is called a star schema or star join
The star schema is the simplest data warehouse schema. It is called a star
schema because the diagram resembles a star, with points radiating from a
center. The center of the star consists of one or more fact tables and the points
of the star are the dimension tables, as shown in fig. It contains:
A large central table (fact table)
A set of smaller attendant tables (dimension table), one for each dimension
Dr.Sunil Khilari
Star schema
Measures
Additive semi-
additive non-
additive
Descriptive, textual values.
Dr.Sunil Khilari
Star Schema for Sales
Dimension Tables
Dr.Sunil Khilari
Snowflake schema
Measures
Dr.Sunil Khilari
Fact constellations
Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact
constellation
Dr.Sunil Khilari
Fact constellations
Measures
Dr.Sunil Khilari
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year item_key supplier_type
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures 30
Dr.Sunil Khilari
Example of Snowflake Schema
time item
time_key item_key supplier
day Sales Fact Table item_name supplier_key
day_of_the_week brand supplier_type
month
time_key type
quarter item_key supplier_key
year
branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province
Measures Dr.Sunil Khilari country 31
Example of Fact Constellation
time
time_key Shipping Fact Table
day item
day_of_the_week Sales Fact Table item_key time_key
month item_name
quarter time_key brand item_key
year type shipper_key
item_key supplier_type
branch_key from_location
Dr.Sunil Khilari
Dr.Sunil Khilari
Dr.Sunil Khilari
Dr.Sunil Khilari
OLAP Technology & Data cubes
OLAP can be a valuable and rewarding business tool. Aside from producing reports, OLAP
analysis can aid an organization evaluate balanced scorecard targets.
Dr.Sunil Khilari
OLAP Technology
To obtain answers, such as the ones above, from a data model OLAP cubes are created.
OLAP cubes are not strictly cuboids - it is the name given to the process of
linking data from the different dimensions. The cubes can be developed along
business units such as sales or marketing. Or a giant cube can be formed with all the
dimensions.
Dr.Sunil Khilari
OLAP Technology for Data Mining
OLAP tools enable users to interactively analyze multidimensional data from multiple
perspectives. OLAP consists of three basic analytical operations:
For example, all sales offices are rolled up to the sales department or sales division
to anticipate sales trends. In contrast, the drill-down is a technique that allows users
to navigate through the details. For instance, users can view the sales by individual
products that make up a region’s sales.
Slicing and dicing is a feature whereby users can take out (slicing) a specific set of
data of the OLAP cube and view (dicing) the slices from different viewpoints.
Dr.Sunil Khilari
OLAP Server Architectures-MOLAP, ROLAP, HOLAP
(server)
Dr.Sunil Khilari
OLAP Server Architectures(Cont..)
Dr.Sunil Khilari
OLAP Server Architectures(Cont..)
Dr.Sunil Khilari
Operational DBMS vs. Data Warehouse vs.
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
Dr.Sunil Khilari
4D Data cube Example
A data cube, such as sales, allows data to be modeled and viewed
in multiple dimensions
Suppose ALLELETRONICS create a sales data warehouse with
respect to dimensions
Time
Item
Location
Supplier
Dr.Sunil Khilari
4D Data cube Example
Dr.Sunil Khilari
Relational Database Model
Dr.Sunil Khilari
Multidimensional Database Model
Customer Store
Store
Time Time
SALES FINANCE
Product GL_Line
Dr.Sunil Khilari
Two dimensions
Dr.Sunil Khilari
Three dimensions
Dr.Sunil Khilari
Operations on cubes
Conceiving data as a cube with hierarchical dimensions leads
to conceptually straightforward operations to facilitate
analysis. Aligning the data content with a familiar
visualization enhances analyst learning and productivity. The
user-initiated process of navigating by calling for
page displays interactively, through the
specification of slices via rotations and drill down is
sometimes called "slice and dice". Common
operations include slice and dice, drill down, roll up,
and pivot.
Dr.Sunil Khilari
OLAP slicing
Slice is the act of picking a rectangular subset of a cube by choosing a single value
for one of its dimensions, creating a new cube with one fewer dimension.
The picture shows a slicing operation: The sales figures of all sales regions and all
product categories of the company in the year 2004 are "sliced" out the data cube.
Dr.Sunil Khilari
OLAP dicing
Dice: The dice operation produces a sub cube by allowing the analyst to pick
specific values of multiple dimensions.
The picture shows a dicing operation: The new cube shows the sales figures of a
limited number of product categories, the time and region dimensions cover the
same range as before.
Dr.Sunil Khilari
OLAP Drill down & up
Drill Down/Drill up :
The two basic hierarchical operations when displaying data at multiple levels of aggregations are
the ``drill-down'' and ``roll-up'' operations. Drill-down refers to the process of viewing data at a
level of increased detail, while roll-up refers to the process of viewing data with decreasing detail.
or (Higher level summary to lower level summary or detailed data, or introducing new
dimensions)
The picture shows a drill-down operation: The analysts moves from the summary
category “TV" to see the sales figures for the individual products.
Dr.Sunil Khilari
OLAP pivoting
Pivot allows an analyst to rotate the cube in space to see its various faces. For example, cities
could be arranged vertically and products horizontally while viewing data for a particular quarter.
Pivoting could replace products with time periods to see data across time for a single product.
The picture shows a pivoting operation: The whole cube is rotated, giving
another perspective on the data.
Dr.Sunil Khilari
Roll-up and Drill Down
Low- Level of
Details
Sales Channel
Region
Country
State
Location Address
Sales Representative
High-level
Dr.Sunil Khilari
Aggregation Details
Typical OLAP Operations
Slice and dice:
Project and select
Pivot (rotate):
reorient the cube, visualization, 3D to series of 2D planes.
Other operations
What is the drill up , drill down, drill by , drill trough ?
Drill up : One level up in the Hierarchy
Drill down: One level down in the Hierarchy
Drill by : direct selection of level in the Hierarchy
Drill trough : to drill data from one Hierarchy to another Hierarchy
Drill across: involving (across) more than one fact table
Dr.Sunil Khilari
Browsing a Data Cube
Visualization
OLAP capabilities
Interactive manipulation
Dr.Sunil Khilari
Example of Cube
Dr.Sunil Khilari
Example of Cube
Dr.Sunil Khilari
The Complete Decision Support System
Data Marts
Dr.Sunil Khilari
Multi-Tiered Architecture
Monitor
& OLAP Server
other Metadata OLAP
Integrator
sources
Query/Reporting
Data Marts
Data Sources Data Storage OLAP Engine
Front-End Tools
Source layer Transformation Layer Dr.Sunil Khilari
Presentation
layer
Data Warehouse Usage
Three kinds of data warehouse applications
Information processing
supports querying, basic statistical analysis, and reporting
using crosstabs, tables, charts and graphs
Analytical processing
multidimensional analysis of data warehouse data
supports basic OLAP operations, slice-dice, drilling, pivoting
Data mining
knowledge discovery from hidden patterns
supports associations, constructing analytical models,
performing classification and prediction, and presenting the
mining results using visualization tools.
Dr.Sunil Khilari
Benefits from data warehousing
Time Savings
For data suppliers and for users
More and better information
More cost –effective decisions-making
Improvement of business processes
Support for the accomplishment of strategic business
objectives.
Better enterprise intelligence
Information system reengineering
Dr.Sunil Khilari
Disadvantages of DW
Dr.Sunil Khilari
Summary
Data warehouse
A subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-making
process
A multi-dimensional model of a data warehouse
Star schema, snowflake schema, fact constellations
A data cube consists of dimensions & measures
OLAP operations: drilling, rolling, slicing, dicing and pivoting
OLAP servers: ROLAP, MOLAP, HOLAP
Further development of data cube technology
Discovery-drive and multi-feature cubes
From OLAP to OLAM (on-line analytical mining)
Data preprocessing and major task
Dr.Sunil Khilari
Questions
Dr.Sunil Khilari