0% found this document useful (0 votes)
16 views53 pages

DMDW-MDM L8,9

The document discusses the multidimensional data model used in data warehouses and OLAP tools, emphasizing the structure of data cubes defined by dimensions and facts. It outlines various schemas such as Star, Snowflake, and Fact Constellation, detailing their characteristics and use cases. Additionally, it covers OLAP operations, data warehouse architecture, and the benefits and challenges of a three-tier data warehouse system.

Uploaded by

xataje8102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views53 pages

DMDW-MDM L8,9

The document discusses the multidimensional data model used in data warehouses and OLAP tools, emphasizing the structure of data cubes defined by dimensions and facts. It outlines various schemas such as Star, Snowflake, and Fact Constellation, detailing their characteristics and use cases. Additionally, it covers OLAP operations, data warehouse architecture, and the benefits and challenges of a three-tier data warehouse system.

Uploaded by

xataje8102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Multidimensional Data Model

Multidimensional Data Model


❖ Data warehouses and OLAP tools are based on a multidimensional data
model. This model views data in the form of a data cube.
❖ Data Cube allows data to be modeled and viewed in multiple dimensions. It is
defined by dimensions and facts.
❖ Dimensions are the perspectives or entities with respect to which an
organization wants to keep records.
❖ E.g:-
➢ Data warehouse in order to keep records of the store’s sales with respect to the dimensions
time, item, branch, and location.
Basic terms: Multidimensional Data Model
❖ Facts are numerical measures.
❖ Each dimension may have a table associated with it, called a dimension
table.
❖ Dimension tables can be specified by users / experts / automatically
generated and adjusted based on data distributions.
❖ The fact table contains the names of the facts, or measures, as well as keys
to each of the related dimension tables.
❖ The 0-D cuboid, which holds the highest level of summarization, is called the
apex cuboid.
❖ The cuboid that holds the lowest level of summarization is called the base
cuboid.
Schemas for Multidimensional Databases
❖ Data warehouse schema is a description, represented by objects such as
tables and indexes, of how data relates logically within a data warehouse.
❖ Data warehouse is a multidimensional model can exist in the form of:
❖ Star Schemas
❖ Snowflake Schemas
❖ Fact Constellation Schemas
Star Schema
❖ Star schema in a data warehouse is historically one of the most
straightforward designs.
❖ Star schema follows some distinct design parameters, such as only permitting
one central table and a handful of single-dimension tables joined to the table.
❖ Star Schema is known to create denormalized dimension tables
❖ Denormalization intends to introduce redundancy in additional dimensions
so long as it improves query performance.
Characteristics of the Star Schema

❖ Star data warehouse schemas create a denormalized database that enables


quick querying responses
❖ The primary key in the dimension table is joined to the fact table by the
foreign key
❖ Each dimension in the star schema maps to one dimension table
❖ Dimension tables within a star scheme are not to be connected directly
❖ Star schema creates denormalized dimension tables
Snowflake Schema

❖ Snowflake Schema is a data warehouse schema that encompasses a logical


arrangement of dimension tables.
❖ This data warehouse schema builds on the star schema by adding additional
sub-dimension tables that relate to first-order dimension tables joined to the
fact table.
❖ Snowflake schema approach, a primary key in a sub-dimension table
will relate to a foreign key within the higher order dimension table
❖ Snowflake schema creates normalized dimension tables.
❖ Purpose of normalization is to eliminate any redundant data to reduce
overhead.
Characteristics of the Snowflake Schema

❖ Snowflake Schema are permitted to have dimension tables joined to other


dimension tables
❖ Snowflake Schema are to have one fact table only
❖ Snowflake Schema create normalized dimension tables
❖ The normalized schema reduces required disk space for running and
managing this data warehouse
❖ Snowflake Scheme offer an easier way to implement a dimension
Fact Constellation Schemas

❖ Fact Constellation Schema also known as a Galaxy Schema.


❖ Fact Constellation Schema uses multiple fact tables connected with shared
normalized dimension tables.
❖ Fact Constellation Schema can be thought of as star schema interlinked and
completely normalized, avoiding any kind of redundancy or inconsistency of
data
Characteristics of the Fact Constellation Schemas

Fact Constellation Schema:


❖ Multidimensional acting as a strong design consideration for complex
database systems
❖ Reduces redundancy to near zero redundancy as a result of normalization
❖ Known for high data quality and accuracy and lends to effective reporting and
analytics
Question: The schema contains a central fact table for sales that contains keys to
each of the four dimensions, along with two measures: dollars sold and units sold.
To minimize the size of the fact table, dimension identifiers (such as time key and
item key) are system-generated identifiers.
Star Schema DQML
define cube sales star [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier type)
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city, province or state,
country)
Snowflake Schema DQML
define cube sales snowflake [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier (supplier key,
supplier type))
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city (city key, city, province or
state, country))
Galaxy Schema DQML
define cube sales [time, item, branch, location]:

dollars sold = sum(sales in dollars), units sold = count(*)

define dimension time as (time key, day, day of week, month, quarter, year)

define dimension item as (item key, item name, brand, type, supplier type)

define dimension branch as (branch key, branch name, branch type)

define dimension location as (location key, street, city, province or state, country)
define cube shipping [time, item, shipper, from location, to location]:
dollars cost = sum(cost in dollars), units shipped = count(*)
define dimension time as time in cube sales
define dimension item as item in cube sales
define dimension shipper as (shipper key, shipper name, location as location in cube sales,
shipper type)
define dimension from location as location in cube sales
define dimension to location as location in cube sales
Concept Hierarchies
❖ Concept Hierarchy Sequence of mappings from a set of low-level concepts to
higher-level, is called Concept Hierarchy.
❖ Concept hierarchy for the dimension location and Time.
➢ Location: Vancouver, Toronto, New York, and Chicago.
➢ Location Time
OLAP Process
OLAP Operations: Drill-down

Drill down operation allows a user to zoom in on the data cube i.e., the less
detailed data is converted into highly detailed data. It can be implemented by
either stepping down a concept hierarchy for a dimension or adding additional
dimensions to the hypercube.
Roll-up

It is the opposite of the drill-down operation and is also known as a drill-up or


aggregation operation. It is a dimension-reduction technique that performs
aggregation on a data cube. It makes the data less detailed and it can be
performed by combining similar dimensions across any axis.
Dice
Dice operation is used to generate a new sub-cube from the existing hypercube. It
selects two or more dimensions from the hypercube to generate a new sub-cube
for the give
Slice

Slice operation is used to select a single dimension from the given cube to
generate a new sub-cube. It represents the information from another point of view.
Pivot
It is used to provide an alternate view of the data available to the users. It is also
known as Rotate operation as it rotates the cube’s orientation to view the data
from different perspectives.
Starnet Query Model
❖ The querying of multidimensional databases can be based on a starnet
model.
❖ A starnet model consists of radial lines emanating from a central point, where
each line represents a concept hierarchy for a dimension.
❖ Each abstraction level in the hierarchy is called a footprint.
Data Warehouse Architecture
Principles of Data Warehousing
Single-Tier Data Warehouse Architecture

❖ The Single-Tier architecture of the data warehouse can be considered as a cumulation


of three layers that is physical source layer, the virtual data warehouse, and the
analysis layer, which can have reporting or OLAP tools.
❖ The purpose of having just a single layer of physical source layer in the architecture of
a data warehouse is mostly to minimize the amount of data stored to reach the goal,
which in turn removes data redundancies.
❖ Single-tier architecture of data warehouse has a primary drawback which is that it
doesn't have a component that separates analytical and transactional processing.
Single-Tier Data Warehouse Architecture
Two-Tier Data Warehouse Architecture

❖ Two-tier architecture vanishes the drawback of the single-tier as it has a separation


between the layers which plays an essential role in maintaining the two-tier
architecture.
❖ The two-tier architecture of the data warehouse comprises the following two tiers:
➢ Data Tier
➢ Client Tier
Two-Tier Data Warehouse Architecture
Two-Tier Data Warehouse Architecture: Data Tier

❖ The Data Tier in the two-tier architecture of the data warehouse can be defined as the
layer where actual data is stored after various ETL processes are used to load data into
the database or the data warehouse.
❖ The staging area where the ETL processes are used in the Data tier helps you ensure
that all data loaded into the warehouse is cleansed and in the appropriate format.

❖ The Data Tier consists of the following Three layers:


➢ The Source Layer
➢ The Data Staging Layer
➢ The Data Warehouse Layer
Three-Tier Data Warehouse Architecture
Three-Tier Data Warehouse Architecture
Three-Tier Data Warehouse Architecture

❖ Data Warehouse design in order to build a Data Warehouse by including


the required:
➢ Data Warehouse Schema Model,
➢ OLAP server type, and
➢ front-end tools for Reporting or Analysis purposes.
❖ The three different tiers here are termed as:
➢ Top-Tier
➢ Middle-Tier
➢ Bottom-Tier
Three-Tier Data Warehouse Architecture: Bottom-Tier

❖ The Bottom Tier in the three-tier architecture of a data warehouse consists of the Data
Repository.
❖ Data Repository is the storage space for the data extracted from various data sources,
which undergoes a series of activities as a part of the ETL process. ETL stands for
Extract, Transform and Load.
❖ These data are then cleaned up, to avoid repeating or junk data from its current storage
units.
❖ The next step is to transform all these data into a single format of storage.
Three-Tier Data Warehouse Architecture: Bottom-Tier

❖ The final step of ETL is to Load the data on the repository.


❖ ETL tools used are:
➢ Informatica
➢ Microsoft SSIS
➢ Snaplogic
➢ Confluent
➢ Apache Kafka
➢ Alooma
➢ Ab Initio
➢ IBM Infosphere
Three-Tier Data Warehouse Architecture: Middle Tier

❖ The Middle tier here is the tier with the OLAP servers.
❖ There are three types of OLAP server models, such as:
➢ ROLAP: Relational online analytical processing is a model of online analytical processing which
carries out an active multidimensional breakdown of data stored in a relational database, instead of
redesigning a relational database into a multidimensional database.
➢ MOLAP: Multidimensional online analytical processing is another model of online analytical
processing that catalogs and comprises of directories directly on its multidimensional database
system.
➢ HOLAP: Hybrid online analytical processing is a hybrid of both relational and multidimensional online
analytical processing models.
Three-Tier Data Warehouse Architecture: Top Tier

The Top Tier is a front-end layer, that is, the user interface that allows the user to connect
with the database systems.

This user interface is usually a tool or an API call, which is used to fetch the required data for
Reporting, Analysis, and Data Mining purposes.
if the Top tier is enabled with a bungling front-end tool, then the whole Data Warehouse Architecture can become an utter failure.
Three-Tier Data Warehouse Architecture: Top Tier

❖ Top Tier tools used are:


➢ IBM Cognos
➢ Microsoft BI Platform
➢ SAP Business Objects Web
➢ Pentaho
➢ Crystal Reports
➢ SAP BW
➢ SAS Business Intelligence
Different Data Warehouse Models

❖ Enterprise Warehouse
➢ A centralised system integrating data from all functions.
➢ Supports extensive queries and analysis for the entire organisation.
➢ Example: A large retail chain tracking inventory, sales, and customer trends across all
stores.
❖ Data Mart
➢ A smaller, department-specific subset of the warehouse.
➢ Designed for quick access and targeted analysis.
➢ Example: A finance team analysing quarterly budgets and expenditures.
❖ Virtual Warehouse
➢ Provides on-demand views of operational data.
➢ Focuses on quick access rather than permanent storage.
➢ Example: An e-commerce company generating real-time reports on daily sales
performance
Benefits of Three-Tier Data Warehouse System

❖ Scalability
➢ Handles growing data volumes without breaking a sweat.
➢ Supports an increasing number of users.
❖ Separation of Concerns:
➢ Keeps transactional and analytical processing distinct.
➢ Ensures faster analysis without affecting daily operations.
❖ Improved Query Performance:
➢ Prepares data in advance for lightning-fast queries.
➢ Delivers insights in seconds.
❖ Flexibility:
➢ Adapts to new data sources easily.
➢ Integrates with modern tools for enhanced analysis.
❖ Data Quality Assurance:
➢ Cleanses and standardize data before storage.
➢ Reduces errors and ensures reliability.
Challenges of Three-Tier Data Warehouse System

❖ Complexity: The top-down approach can be complex and time-consuming. To mitigate


this, it is essential to have a clear project plan and experienced personnel.
❖ Data Integration Issues: Integrating data from disparate sources can be challenging.
Using robust ETL tools and data integration techniques can help overcome these
challenges.
❖ Scalability: Ensuring the scalability of the EDW is crucial. This can be achieved by
using scalable hardware and software architectures.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy