0% found this document useful (0 votes)
65 views20 pages

Unit 3 - Notes

Uploaded by

pr1197
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views20 pages

Unit 3 - Notes

Uploaded by

pr1197
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT 3 – NOTES

Data Warehouse is a relational database management system (RDBMS) construct to meet the
requirement of transaction processing systems. It can be loosely described as any centralized
data repository which can be queried for business benefits. It is a database that stores
information oriented to satisfy decision-making requests. It is a group of decision support
technologies, targets to enabling the knowledge worker (executive, manager, and analyst) to
make superior and higher decisions. So, Data Warehousing support architectures and tool for
business executives to systematically organize, understand and use their information to make
strategic decisions.
Data Warehouse environment contains an extraction, transportation, and loading (ETL)
solution, an online analytical processing (OLAP) engine, customer analysis tools, and other
applications that handle the process of gathering information and delivering it to business
users.

What is a Data Warehouse?

A Data Warehouse (DW) is a relational database that is designed for query and analysis rather
than transaction processing. It includes historical data derived from transaction data from
single and multiple sources. A Data Warehouse provides integrated, enterprise-wide,
historical data and focuses on providing support for decision-makers for data modeling and
analysis. Data warehouses and their architectures very depending upon the elements of an
organization's situation.

Three common architectures are:

o Data Warehouse Architecture: Basic


o Data Warehouse Architecture: With Staging Area
o Data Warehouse Architecture: With Staging Area and Data Marts

Characteristics of Data Warehouse

1. Subject-Oriented

A data warehouse target on the modeling and analysis of data for decision-makers. Therefore,
data warehouses typically provide a concise and straightforward view around a particular
subject, such as customer, product, or sales, instead of the global organization's ongoing
operations. This is done by excluding data that are not useful concerning the subject and
including all data needed by the users to understand the subject.

2. Integrated

A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and
online transaction records. It requires performing data cleaning and integration during data
warehousing to ensure consistency in naming conventions, attributes types, etc., among
different data sources.

3. Time-Variant

Historical information is kept in a data warehouse. For example, one can retrieve files from 3
months, 6 months, 12 months, or even previous data from a data warehouse. These variations
with a transactions system, where often only the most current file is kept.

4. Non-Volatile
The data warehouse is a physically separate data storage, which is transformed from the
source operational RDBMS. The operational updates of data do not occur in the data
warehouse, i.e., update, insert, and delete operations are not performed. It usually requires
only two procedures in data accessing: Initial loading of data and access to data. Therefore,
the DW does not require transaction processing, recovery, and concurrency capabilities,
which allows for substantial speedup of data retrieval. Non-Volatile defines that once entered
into the warehouse, and data should not change.

Goals of Data Warehousing

o To help reporting as well as analysis


o Maintain the organization's historical information
o Be the foundation for decision making.

Need for Data Warehouse

Data Warehouse is needed for the following reasons:

1. Business User: Business users require a data warehouse to view summarized data
from the past. Since these people are non-technical, the data may be presented to them
in an elementary form.
2. Store historical data: Data Warehouse is required to store the time variable data from
the past. This input is made to be used for various purposes.
3. Make strategic decisions: Some strategies may be depending upon the data in the
data warehouse. So, data warehouse contributes to making strategic decisions.
4. For data consistency and quality: Bringing the data from different sources at a
commonplace, the user can effectively undertake to bring the uniformity and
consistency in data.
5. High response time: Data warehouse has to be ready for somewhat unexpected loads
and types of queries, which demands a significant degree of flexibility and quick
response time.

Benefits of Data Warehouse

1. Understand business trends and make better forecasting decisions.


2. Data Warehouses are designed to perform well enormous amounts of data.
3. The structure of data warehouses is more accessible for end-users to navigate,
understand, and query.
4. Queries that would be complex in many normalized databases could be easier to build
and maintain in data warehouses.
5. Data warehousing is an efficient method to manage demand for lots of information
from lots of users.
6. Data warehousing provide the capabilities to analyze a large amount of historical data.

Data Warehouse Architecture: Basic

Operational System

An operational system is a method used in data warehousing to refer to a system that is used
to process the day-to-day transactions of an organization.

Flat Files

A Flat file system is a system of files in which transactional data is stored, and every file in
the system must have a different name.

Meta Data

A set of data that defines and gives information about other data.

Meta Data used in Data Warehouse for a variety of purpose, including:

Meta Data summarizes necessary information about data, which can make finding and work
with particular instances of data more accessible. For example, author, data build, and data
changed, and file size are examples of very basic document metadata.

Metadata is used to direct a query to the most appropriate data source.

Lightly and highly summarized data

The area of the data warehouse saves all the predefined lightly and highly summarized
(aggregated) data generated by the warehouse manager.

The goals of the summarized information are to speed up query performance. The
summarized record is updated continuously as new information is loaded into the warehouse.

End-User access Tools


The principal purpose of a data warehouse is to provide information to the business managers
for strategic decision-making. These customers interact with the warehouse using end-client
access tools.

The examples of some of the end-user access tools can be:

o Reporting and Query Tools


o Application Development Tools
o Executive Information Systems Tools
o Online Analytical Processing Tools
o Data Mining Tools

Data Warehouse Architecture: With Staging Area

We must clean and process your operational information before put it into the warehouse.We
can do this programmatically, although data warehouses uses a staging area (A place where
data is processed before entering the warehouse).

A staging area simplifies data cleansing and consolidation for operational method coming
from multiple source systems, especially for enterprise data warehouses where all relevant
data of an enterprise is consolidated.

Data Warehouse Staging Area is a temporary location where a record from source systems
is copied.

Data Warehouse Architecture: With Staging Area and Data Marts


We may want to customize our warehouse's architecture for multiple groups within our
organization.

We can do this by adding data marts. A data mart is a segment of a data warehouses that can
provided information for reporting and analysis on a section, unit, department or operation in
the company, e.g., sales, payroll, production, etc.

The figure illustrates an example where purchasing, sales, and stocks are separated. In this
example, a financial analyst wants to analyze historical data for purchases and sales or mine
historical information to make predictions about customer behavior.

Properties of Data Warehouse Architectures

1. Separation: Analytical and transactional processing should be keep apart as much as


possible.

2. Scalability: Hardware and software architectures should be simple to upgrade the data
volume, which has to be managed and processed, and the number of user's requirements,
which have to be met, progressively increase.

3. Extensibility: The architecture should be able to perform new operations and technologies
without redesigning the whole system.

4. Security: Monitoring accesses are necessary because of the strategic data stored in the data
warehouses.

5. Administerability: Data Warehouse management should not be complicated.

Data Warehouse Process Architecture


The process architecture defines an architecture in which the data from the data warehouse is
processed for a particular computation. Following are the two fundamental process
architectures:

1. Centralized Process Architecture

In this architecture, the data is collected into single centralized storage and processed upon
completion by a single machine with a huge structure in terms of memory, processor, and
storage.
Centralized process architecture evolved with transaction processing and is well suited for
small organizations with one location of service. It requires minimal resources both from
people and system perspectives. It is very successful when the collection and consumption of
data occur at the same location.

2. Distributed Process Architecture

In this architecture, information and its processing are allocated across data centers, and its
processing is distributed across data centers, and processing of data is localized with the
group of the results into centralized storage. Distributed architectures are used to overcome
the limitations of the centralized process architectures where all the information needs to be
collected to one central location, and results are available in one central location.
There are several architectures of the distributed process:
Client-Server
In this architecture, the user does all the information collecting and presentation, while the
server does the processing and management of data.
Three-tier Architecture
With client-server architecture, the client machines need to be connected to a server machine,
thus mandating finite states and introducing latencies and overhead in terms of record to be
carried between clients and servers.
N-tier Architecture
The n-tier or multi-tier architecture is where clients, middleware, applications, and servers are
isolated into tiers.
Cluster Architecture
In this architecture, machines that are connected in network architecture (software or
hardware) to approximately work together to process information or compute requirements in
parallel. Each device in a cluster is associated with a function that is processed locally, and
the result sets are collected to a master server that returns it to the user.
Peer-to-Peer Architecture
This is a type of architecture where there are no dedicated servers and clients. Instead, all the
processing responsibilities are allocated among all machines, called peers. Each machine can
perform the function of a client or server or just process data.
Process managers are responsible for maintaining the flow of data both into and out of the
data warehouse. There are three different types of process managers −
● Load manager
● Warehouse manager
● Query manager
Data Warehouse Load Manager
Load manager performs the operations required to extract and load the data into the database.
The size and complexity of a load manager varies between specific solutions from one data
warehouse to another.
Load Manager Architecture
The load manager does performs the following functions −
● Extract data from the source system.
● Fast load the extracted data into temporary data store.
● Perform simple transformations into structure similar to the one in the data
warehouse.

Extract Data from Source


The data is extracted from the operational databases or the external information providers.
Gateways are the application programs that are used to extract data. It is supported by
underlying DBMS and allows the client program to generate SQL to be executed at a server.
Open Database Connection (ODBC) and Java Database Connection (JDBC) are examples of
gateway.
Fast Load
In order to minimize the total load window, the data needs to be loaded into the

warehouse in the fastest possible time.
● Transformations affect the speed of data processing.
● It is more effective to load the data into a relational database prior to applying
transformations and checks.
● Gateway technology is not suitable, since they are inefficient when large data
volumes are involved.
Simple Transformations
While loading, it may be required to perform simple transformations. After completing
simple transformations, we can do complex checks. Suppose we are loading the EPOS sales
transaction, we need to perform the following checks −
● Strip out all the columns that are not required within the warehouse.
● Convert all the values to required data types.
Warehouse Manager
The warehouse manager is responsible for the warehouse management process. It consists of
a third-party system software, C programs, and shell scripts. The size and complexity of a
warehouse manager varies between specific solutions.
Warehouse Manager Architecture
A warehouse manager includes the following −
● The controlling process
● Stored procedures or C with SQL
● Backup/Recovery tool
● SQL scripts

Functions of Warehouse Manager


A warehouse manager performs the following functions −
● Analyzes the data to perform consistency and referential integrity checks.
● Creates indexes, business views, partition views against the base data.
● Generates new aggregations and updates the existing aggregations.
● Generates normalizations.
● Transforms and merges the source data of the temporary store into the
published data warehouse.
● Backs up the data in the data warehouse.
● Archives the data that has reached the end of its captured life.
Note − A warehouse Manager analyzes query profiles to determine whether the index and
aggregations are appropriate.
Query Manager
The query manager is responsible for directing the queries to suitable tables. By directing the
queries to appropriate tables, it speeds up the query request and response process. In addition,
the query manager is responsible for scheduling the execution of the queries posted by the
user.
Query Manager Architecture
A query manager includes the following components −
● Query redirection via C tool or RDBMS
● Stored procedures
● Query management tool
● Query scheduling via C tool or RDBMS
● Query scheduling via third-party software
Functions of Query Manager
● It presents the data to the user in a form they understand.
● It schedules the execution of the queries posted by the end-user.
● It stores query profiles to allow the warehouse manager to determine which
indexes and aggregations are appropriate

Datawarehouse Users
There are various types of data warehouse users which are as follows −
● Statisticians − There are generally only a handful of sophisticated analysts
Statisticians and operations research types in any organization. Though few,
they are multiple best users of the data warehouse; those whose work can
contribute to closed-loop systems that deeply hold the operations and
profitability of the organizations.
These users must come to fondness the data warehouse. It is usually that is not
difficult. These people are very self-sufficient and required only to be pointed
to the database and given some simple instructions about how to get to the
information and what times of the day are best for implementing large queries
to retrieve data to analyze using their sophisticated tools.
● Knowledge Workers − A relatively small number of analysts implement the
number of new queries and analyses against the data warehouse. These are the
users who have the “Designer” or “Analyst” versions of user access tools.
After a few iterations, their queries and documents generally get published for
the benefit of the information consumers. Knowledge Workers are often
intensely engaged with the data warehouse design and place the highest
demands on the ongoing data warehouse operations team for training and
support.
● Information Consumers − Some users of the data warehouse are information
consumers and they will probably never compose a valid ad hoc query. They
use static or simple interactive documents that have been developed. It is
simple to forget about these users because they generally communicate with
the data warehouse only through the work product of others.
● Executives − Executives are a specific case of the Information Consumers
group. Some executives issue their queries, but an executive’s slightest musing
can create a flurry of activity between the other types of users. A wise data
warehouse designer will develop a very frigid digital dashboard for executives,
considering it is easy and economical to do so. Generally, this must follow
other data warehouse work, but it never hurts to influence the bosses
DATAWAREHOUSE OBJECTS
Fact
It is a collection of associated data items, consisting of measures and context data. It typically
represents business items or business transactions.
Dimensions
It is a collection of data which describe one business dimension. Dimensions decide the
contextual background for the facts, and they are the framework over which OLAP is
performed.
Measure
It is a numeric attribute of a fact, representing the performance or behavior of the business
relative to the dimensions.
Considering the relational context, there are two basic models which are used in dimensional
modeling:
o Star Model
o Snowflake Model

The star model is the underlying structure for a dimensional model. It has one broad central
table (fact table) and a set of smaller tables (dimensions) arranged in a radial design around
the primary table. The snowflake model is the conclusion of decomposing one or more of the
dimensions.
Fact Table
Fact tables are used to data facts or measures in the business. Facts are the numeric data
elements that are of interest to the company.
Characteristics of the Fact table
The fact table includes numerical values of what we measure. For example, a fact value of 20
might means that 20 widgets have been sold.
Each fact table includes the keys to associated dimension tables. These are known as foreign
keys in the fact table.
Fact tables typically include a small number of columns.
When it is compared to dimension tables, fact tables have a large number of rows.
Dimension Table
Dimension tables establish the context of the facts. Dimensional tables store fields that
describe the facts.
Characteristics of the Dimension table
Dimension tables contain the details about the facts. That, as an example, enables the
business analysts to understand the data and their reports better.
The dimension tables include descriptive data about the numerical values in the fact table.
That is, they contain the attributes of the facts. For example, the dimension tables for a
marketing analysis function might include attributes such as time, marketing region, and
product type.
Since the record in a dimension table is denormalized, it usually has a large number of
columns. The dimension tables include significantly fewer rows of information than the fact
table.
The attributes in a dimension table are used as row and column headings in a document or
query results display.
Example: A city and state can view a store summary in a fact table. Item summary can be
viewed by brand, color, etc. Customer information can be viewed by name and address.

Time ID Product ID Customer ID Unit Sold

4 17 2 1

8 21 3 2

8 4 1 1

Fact Table

In this example, Customer ID column in the facts table is the foreign keys that join with the
dimension table. By following the links, we can see that row 2 of the fact table records the
fact that customer 3, Gaurav, bought two items on day 8.
Dimension Tables

Customer ID Name Gender Income Education Region

1 Rohan Male 2 3 4

2 Sandeep Male 3 5 1

3 Gaurav Male 1 7 3

Difference between OLAP and OLTP


Sr. OLAP (Online analytical OLTP (Online transaction
No. Category processing) processing)

It is well-known as an online
database query management It is well-known as an online
system. database modifying system.
1. Definition

Consists of historical data from Consists of only of operational


various Databases. In other words, current data. In other words, the
different OLTP databases are used original data source is OLTP and
as data sources for OLAP. its transactions.
2. Data source

It makes use of a standard


database management system
It makes use of a data warehouse. (DBMS).
3. Method used

It is subject-oriented. Used for


Data Mining, Analytics, Decisions It is application-oriented. Used
making, etc. for business tasks.
4. Application

In an OLAP database, tables are In an OLTP database, tables are


not normalized. normalized (3NF).
5. Normalized

The data is used in planning, The data is used to perform


problem-solving, and day-to-day fundamental
decision-making. operations.
6. Usage of data

It reveals a snapshot of present It provides a multi-dimensional


business tasks. view of different business tasks.
7. Task

It serves the purpose to extract It serves the purpose to Insert,


information for analysis and Update, and Delete information
decision-making. from the database.
8. Purpose

The size of the data is relatively


A large amount of data is stored small as the historical data is
Volume of typically in TB, PB archived. For ex MB, GB
9. data

Relatively slow as the amount of


data involved is large. Queries Very Fast as the queries operate
may take hours. on 5% of the data.
10. Queries

The OLAP database is not often The data integrity constraint must
updated. As a result, data integrity be maintained in an OLTP
is unaffected. database.
11. Update

It only need backup from time to Backup and recovery process is


Backup and time as compared to OLTP. maintained rigorously
12. Recovery
Sr. OLAP (Online analytical OLTP (Online transaction
No. Category processing) processing)

It is comparatively fast in
The processing of complex processing because of simple and
Processing queries can take a lengthy time. straightforward queries.
13. time

This data is generally managed by This data is managed by clerks,


Types of CEO, MD, GM. managers.
14. users

Only read and rarely write


operation. Both read and write operations.
15. Operations

With lengthy, scheduled batch


operations, data is refreshed on a The user initiates data updates,
regular basis. which are brief and quick.
16. Updates

Process that is focused on the Process that is focused on the


Nature of customer. market.
17. audience

Design with a focus on the Design that is focused on the


Database subject. application.
18. Design

Improves the efficiency of


business analysts. Enhances the user’s productivity.
19. Productivity

SCHEMAS
What is Star Schema?
A star schema is the elementary form of a dimensional model, in which data are organized
into facts and dimensions. A fact is an event that is counted or measured, such as a sale or
log in. A dimension includes reference data about the fact, such as date, item, or customer.
A star schema is a relational schema where a relational schema whose design represents a
multidimensional data model. The star schema is the explicit data warehouse schema. It is
known as star schema because the entity-relationship diagram of this schemas simulates a
star, with points, diverge from a central table. The center of the schema consists of a large
fact table, and the points of the star are the dimension tables.
Fact Tables

A table in a star schema which contains facts and connected to dimensions. A fact table has
two types of columns: those that include fact and those that are foreign keys to the dimension
table. The primary key of the fact tables is generally a composite key that is made up of all of
its foreign keys.
A fact table might involve either detail level fact or fact that have been aggregated (fact tables
that include aggregated fact are often instead called summary tables). A fact table generally
contains facts with the same level of aggregation.

Dimension Tables

A dimension is an architecture usually composed of one or more hierarchies that categorize


data. If a dimension has not got hierarchies and levels, it is called a flat dimension or list.
The primary keys of each of the dimensions table are part of the composite primary keys of
the fact table. Dimensional attributes help to define the dimensional value. They are generally
descriptive, textual values. Dimensional tables are usually small in size than fact table.
Fact tables store data about sales while dimension tables data about the geographic region
(markets, cities), clients, products, times, channels.

Characteristics of Star Schema

The star schema is intensely suitable for data warehouse database design because of the
following features:
o It creates a DE-normalized database that can quickly provide query responses.
o It provides a flexible design that can be changed easily or added to throughout the
development cycle, and as the database grows.
o It provides a parallel in design to how end-users typically think of and use the data.
o It reduces the complexity of metadata for both developers and end-users.

Advantages of Star Schema

Star Schemas are easy for end-users and application to understand and navigate. With a
well-designed schema, the customer can instantly analyze large, multidimensional data sets.
The main advantage of star schemas in a decision-support environment are:
Query Performance
A star schema database has a limited number of table and clear join paths, the query run
faster than they do against OLTP systems. Small single-table queries, frequently of a
dimension table, are almost instantaneous. Large join queries that contain multiple tables
takes only seconds or minutes to run.
In a star schema database design, the dimension is connected only through the central fact
table. When the two-dimension table is used in a query, only one join path, intersecting the
fact tables, exist between those two tables. This design feature enforces authentic and
consistent query results.
Load performance and administration
Structural simplicity also decreases the time required to load large batches of record into a
star schema database. By describing facts and dimensions and separating them into the
various table, the impact of a load structure is reduced. Dimension table can be populated
once and occasionally refreshed. We can add new facts regularly and selectively by
appending records to a fact table.
Built-in referential integrity
A star schema has referential integrity built-in when information is loaded. Referential
integrity is enforced because each data in dimensional tables has a unique primary key, and
all keys in the fact table are legitimate foreign keys drawn from the dimension table. A record
in the fact table which is not related correctly to a dimension cannot be given the correct key
value to be retrieved.
Easily Understood
A star schema is simple to understand and navigate, with dimensions joined only through the
fact table. These joins are more significant to the end-user because they represent the
fundamental relationship between parts of the underlying business. Customer can also browse
dimension table attributes before constructing a query.

Disadvantage of Star Schema

There is some condition which cannot be meet by star schemas like the relationship between
the user, and bank account cannot describe as star schema as the relationship between them is
many to many.
Example: Suppose a star schema is composed of a fact table, SALES, and several dimension
tables connected to it for time, branch, item, and geographic locations.
The TIME table has a column for each day, month, quarter, and year. The ITEM table has
columns for each item_Key, item_name, brand, type, supplier_type. The BRANCH table has
columns for each branch_key, branch_name, branch_type. The LOCATION table has
columns of geographic data, including street, city, state, and country.

In this scenario, the SALES table contains only four columns with IDs from the dimension
tables, TIME, ITEM, BRANCH, and LOCATION, instead of four columns for time data, four
columns for ITEM data, three columns for BRANCH data, and four columns for LOCATION
data. Thus, the size of the fact table is significantly reduced. When we need to change an
item, we need only make a single change in the dimension table, instead of making many
changes in the fact table.
What is Snowflake Schema?
A snowflake schema is equivalent to the star schema. "A schema is known as a snowflake if
one or more dimension tables do not connect directly to the fact table but must join through
other dimension tables."
The snowflake schema is an expansion of the star schema where each point of the star
explodes into more points. It is called snowflake schema because the diagram of snowflake
schema resembles a snowflake. Snowflaking is a method of normalizing the dimension
tables in a STAR schemas. When we normalize all the dimension tables entirely, the resultant
structure resembles a snowflake with the fact table in the middle.
Snowflaking is used to develop the performance of specific queries. The schema is diagramed
with each fact surrounded by its associated dimensions, and those dimensions are related to
other dimensions, branching out into a snowflake pattern.
The snowflake schema consists of one fact table which is linked to many dimension tables,
which can be linked to other dimension tables through a many-to-one relationship. Tables in a
snowflake schema are generally normalized to the third normal form. Each dimension table
performs exactly one level in a hierarchy.
The following diagram shows a snowflake schema with two dimensions, each having three
levels. A snowflake schema can have any number of dimension, and each dimension can
have any number of levels.

Example: Figure shows a snowflake schema with a Sales fact table, with Store, Location,
Time, Product, Line, and Family dimension tables. The Market dimension has two dimension
tables with Store as the primary dimension table, and Location as the outrigger dimension
table. The product dimension has three dimension tables with Product as the primary
dimension table, and the Line and Family table are the outrigger dimension tables.
A star schema store all attributes for a dimension into one denormalized table. This needed
more disk space than a more normalized snowflake schema. Snowflaking normalizes the
dimension by moving attributes with low cardinality into separate dimension tables that relate
to the core dimension table by using foreign keys. Snowflaking for the sole purpose of
minimizing disk space is not recommended, because it can adversely impact query
performance.
In snowflake, schema tables are normalized to delete redundancy. In snowflake dimension
tables are damaged into multiple dimension tables.
Figure shows a simple STAR schema for sales in a manufacturing company. The sales fact
table include quantity, price, and other relevant metrics. SALESREP, CUSTOMER,
PRODUCT, and TIME are the dimension tables.
The STAR schema for sales, as shown above, contains only five tables, whereas the
normalized version now extends to eleven tables. We will notice that in the snowflake
schema, the attributes with low cardinality in each original dimension tables are removed to
form separate tables. These new tables are connected back to the original dimension table
through artificial keys.

A snowflake schema is designed for flexible querying across more complex dimensions and
relationship. It is suitable for many to many and one to many relationships between
dimension levels.

Advantage of Snowflake Schema

1. The primary advantage of the snowflake schema is the development in query


performance due to minimized disk storage requirements and joining smaller lookup
tables.
2. It provides greater scalability in the interrelationship between dimension levels and
components.
3. No redundancy, so it is easier to maintain.

Disadvantage of Snowflake Schema

1. The primary disadvantage of the snowflake schema is the additional maintenance


efforts required due to the increasing number of lookup tables. It is also known as a
multi fact star schema.
2. There are more complex queries and hence, difficult to understand.
3. More tables more join so more query execution time.

What is Fact Constellation Schema?


A Fact constellation means two or more fact tables sharing one or more dimensions. It is also
called Galaxy schema.
Fact Constellation Schema describes a logical structure of data warehouse or data mart. Fact
Constellation Schema can design with a collection of de-normalized FACT, Shared, and
Conformed Dimension tables.
Fact Constellation Schema is a sophisticated database design that is difficult to summarize
information. Fact Constellation Schema can implement between aggregate Fact tables or
decompose a complex Fact table into independent simplex Fact tables.
Example: A fact constellation schema is shown in the figure below.

This schema defines two fact tables, sales, and shipping. Sales are treated along four
dimensions, namely, time, item, branch, and location. The schema contains a fact table for
sales that includes keys to each of the four dimensions, along with two measures: Rupee_sold
and units_sold. The shipping table has five dimensions, or keys: item_key, time_key,
shipper_key, from_location, and to_location, and two measures: Rupee_cost and
units_shipped.
The primary disadvantage of the fact constellation schema is that it is a more challenging
design because many variants for specific kinds of aggregation must be considered and
selected.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy