0% found this document useful (0 votes)
14 views59 pages

Unit1 Dwbi

The document outlines the architecture and components of data warehousing and business intelligence, including the processes of data extraction, transformation, and loading (ETL), as well as the structure of data warehouses and data marts. It describes the three-tier architecture of data warehouses, the multidimensional data model, and various operations such as roll-up, drill-down, slice, and pivot for data analysis. Additionally, it explains different schema designs like star, snowflake, and fact constellation schemas used in data warehousing.

Uploaded by

22b81a6610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views59 pages

Unit1 Dwbi

The document outlines the architecture and components of data warehousing and business intelligence, including the processes of data extraction, transformation, and loading (ETL), as well as the structure of data warehouses and data marts. It describes the three-tier architecture of data warehouses, the multidimensional data model, and various operations such as roll-up, drill-down, slice, and pivot for data analysis. Additionally, it explains different schema designs like star, snowflake, and fact constellation schemas used in data warehousing.

Uploaded by

22b81a6610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

Data Ware Housing

and Business
Intelligence
UNIT-I
Data ware House Architecture
• External Sources: External source is a source from where data is collected
irrespective of the type of data. Data can be structured, semi structured
and unstructured as well.
• Stage Area: Since the data, extracted from the external sources does not
follow a particular format, so there is a need to validate this data to load
into dataware house. For this purpose, it is recommended to use ETL tool.
• E(Extracted): Data is extracted from External data source.

• T(Transform): Data is transformed into the standard format.

• L(Load): Data is loaded into dataware house after transforming it into the standard
format.
• Data-warehouse: After cleansing of data, it is stored in the data
warehouse as central repository. It actually stores the meta data and the
actual data gets stored in the data marts. Note that data warehouse stores
the data in its purest form in this top-down approach.

• Data Marts: Data mart is also a part of storage component. It stores the
information of a particular function of an organisation which is handled by
single authority. There can be as many number of data marts in an
organisation depending upon the functions. We can also say that data
mart contains subset of the data stored in data warehouse.
• Data Mining: The practice of analysing the big data present in data
warehouse is data mining. It is used to find the hidden patterns that
are present in the database or in data warehouse with the help of
algorithm of data mining.
• Three-Tier Data Warehouse Architecture
• Data Warehouses usually have a three-level (tier) architecture that includes:
• Bottom Tier (Data Warehouse Server)
• Middle Tier (OLAP Server)
• Top Tier (Front end Tools).
• A bottom-tier that consists of the Data Warehouse server, which is almost always
an RDBMS. It may include several specialized data marts and a metadata repository.
• Data from operational databases and external sources (such as user profile data
provided by external consultants) are extracted using application program
interfaces called a gateway. A gateway is provided by the underlying DBMS and
allows customer programs to generate SQL code to be executed at a server.
• middle-tier which consists of an OLAP server for fast querying of the data
warehouse.
• The OLAP server is implemented using either
• (1) A Relational OLAP (ROLAP) model, i.e., an extended relational DBMS
that maps functions on multidimensional data to standard relational
operations.
• (2) A Multidimensional OLAP (MOLAP) model, i.e., a particular purpose
server that directly implements multidimensional information and
operations.
• A top-tier that contains front-end tools for displaying results provided by
OLAP, as well as additional tools for data mining of the OLAP-generated data.
• The metadata repository stores information that defines DW objects. It includes the
following parameters and information for the middle and the top-tier applications:
• A description of the DW structure, including the warehouse schema, dimension,
hierarchies, data mart locations, and contents, etc.
• Operational metadata, which usually describes the currency level of the stored data, i.e.,
active, archived or purged, and warehouse monitoring information, i.e., usage statistics,
error reports, audit, etc.
• System performance data, which includes indices, used to improve data access and
retrieval performance.
• Information about the mapping from operational databases, which provides
source RDBMSs and their contents, cleaning and transformation rules, etc.
• Summarization algorithms, predefined queries, and reports business data, which include
business terms and definitions, ownership information, etc.
Multi dimensional data model
• The multi-Dimensional Data Model is a method which is used for
ordering data in the database along with good arrangement and
assembling of the contents in the database.
• The Multi Dimensional Data Model allows customers to interrogate
analytical questions associated with market or business trends, unlike
relational databases which allow customers to access data in the form
of queries.
• They allow users to rapidly receive answers to the requests which
they made by creating and examining the data comparatively fast.
• It represents data in the form of data cubes. Data cubes allow to
model and view the data from many dimensions and perspectives.
• It is defined by dimensions and facts and is represented by a fact
table.
• Facts are numerical measures and fact tables contain measures of the
related dimensional tables or names of the facts.
• For example, a shop may create a sales data warehouse to keep
records of the store's sales for the dimension time, item, and location.
• These dimensions allow the save to keep track of things, for example,
monthly sales of items and the locations at which the items were
sold.
• Each dimension has a table related to it, called a dimensional table,
which describes the dimension further.
• For example, a dimensional table for an item may contain the
attributes item_name, brand, and type.
• for example, sales. This theme is represented by a fact table. Facts are
numerical measures. The fact table contains the names of the facts or
measures of the related dimensional tables.
Data Cube
• When data is grouped or combined in multidimensional matrices called Data
Cubes.
• The data cube method has a few alternative names or a few variants, such as
"Multidimensional databases," "materialized views," and "OLAP (On-Line
Analytical Processing).“

• For example, a relation with the schema sales (part, supplier, customer, and
sale-price) can be materialized into a set of eight views as shown in fig,
where psc indicates a view consisting of aggregate function value (such as total-
sales) computed by grouping three attributes part, supplier, and
customer, p indicates a view composed of the corresponding aggregate function
values calculated by grouping part alone, etc.
• A data cube enables data to be modeled and viewed in multiple
dimensions. A multidimensional data model is organized around a
central theme, like sales and transactions.
• A fact table represents this theme. Facts are numerical measures.
Thus, the fact table contains measure (such as Rs_sold) and keys to
each of the related dimensional tables.
• Dimensions are a fact that defines a data cube. Facts are generally
quantities, which are used for analyzing the relationship between
dimensions.
• Example: In the 2-D representation, we will look at the All Electronics
sales data for items sold per quarter in the city of Vancouver. The
measured display in dollars sold (in thousands).
• Let suppose we would like to view the sales data with a third
dimension.
• For example, suppose we would like to view the data according to
time, item as well as the location for the cities Chicago, New York,
Toronto, and Vancouver.
• The measured display in dollars sold (in thousands). These 3-D data
are shown in the table. The 3-D data of the table are represented as a
series of 2-D tables.
• The topmost 0-D cuboid, which holds the highest level of
summarization, is known as the apex cuboid.
• In this example, this is the total sales, or dollars sold, summarized
over all four dimensions.
• The lattice of cuboid forms a data cube. The figure shows the lattice of
cuboids creating 4-D data cubes for the dimension time, item,
location, and supplier. Each cuboid represents a different degree of
summarization.
• The figure shows data cubes for sales of a shop.
• The cube contains the dimensions, location, and time
and item, where the location is aggregated with
regard to city values, time is aggregated with respect
to quarters, and an item is aggregated with respect to
item types.
• The roll-up operation (also known as drill-up or
aggregation operation) performs aggregation on a
data cube, by climbing down concept hierarchies, i.e.,
dimension reduction.
• Roll-up is like zooming-out on the data cubes. Figure
shows the result of roll-up operations performed on
the dimension location.
• The hierarchy for the location is defined as the Order
Street, city, province, or state, country.
• The roll-up operation aggregates the data by
ascending the location hierarchy from the level of the
city to the level of the country.
• When a roll-up is performed by dimensions reduction, one or more
dimensions are removed from the cube.
• For example, consider a sales data cube having two dimensions,
location and time.
• Roll-up may be performed by removing, the time dimensions,
appearing in an aggregation of the total sales by location, relatively
than by location and by time.
• Drill-Down
• The drill-down operation (also called roll-down) is the reverse
operation of roll-up. Drill-down is like zooming-in on the data cube. It
navigates from less detailed record to more detailed data.
• Drill-down can be performed by either stepping down a concept
hierarchy for a dimension or adding additional dimensions.
• Drill-down appears by descending the time hierarchy from the level
of the quarter to a more detailed level of the month.
• Because a drill-down adds more details to the given data, it can also
be performed by adding a new dimension to a cube.
• For example, a drill-down on the central cubes of the figure can occur
by introducing an additional dimension, such as a customer group.
• A slice is a subset of the cubes corresponding to a single value for one
or more members of the dimension.
• For example, a slice operation is executed when the customer wants a
selection on one dimension of a three-dimensional cube resulting in a
two-dimensional site.
• So, the Slice operations perform a selection on one dimension of the
given cube, thus resulting in a sub cube.
• Pivot
• The pivot operation is also called a rotation. Pivot is a visualization
operations which rotates the data axes in view to provide an
alternative presentation of the data.
• It may contain swapping the rows and columns or moving one of the
row-dimensions into the column dimensions.
Schemas
• The star schema is intensely suitable for data warehouse database
design because of the following features:
• It creates a DE-normalized database that can quickly provide query
responses.
• It provides a flexible design that can be changed easily or added to
throughout the development cycle, and as the database grows.
• It provides a parallel in design to how end-users typically think of and
use the data.
• It reduces the complexity of metadata for both developers and end-
users.
• The snowflake schema consists of one fact table which is linked to
many dimension tables, which can be linked to other dimension
tables through a many-to-one relationship.
• Tables in a snowflake schema are generally normalized to the third
normal form. Each dimension table performs exactly one level in a
hierarchy.T
• Fact Constellation Schema is a sophisticated database design that is
difficult to summarize information.
• Fact Constellation Schema can implement between aggregate Fact
tables or decompose a complex Fact table into independent simplex
Fact tables.
• The schema contains a fact table for sales that includes keys to each
of the four dimensions, along with two measures: Rupee_sold and
units_sold.
• The shipping table has five dimensions, or keys: item_key, time_key,
shipper_key, from_location, and to_location, and two measures:
Rupee_cost and units_shipped.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy