OLAP Cube
OLAP Cube
OLAP Cube
An OLAP (Online analytical processing) cube is a data structure that allows fast
analysis of data.[1] It can also be defined as the capability of manipulating and analyzing
data from multiple perspectives. The arrangement of data into cubes overcomes a
limitation of relational databases. Relational databases are not well suited for near
instantaneous analysis and display of large amounts of data.[citation needed] Instead, they are
better suited for creating records from a series of transactions known as OLTP or On-
Line Transaction Processing.[2] Although many report-writing tools exist for relational
databases, these are slow when the whole database must be summarized.[citation needed]
Background
OLAP (online analytical processing) cubes can be thought of as extensions to the two-
dimensional array of a spreadsheet. For example a company might wish to analyze some
financial data by product, by time-period, by city, by type of revenue and cost, and by
comparing actual data with a budget. These additional methods of analyzing the data are
known as dimensions.[3] Because there can be more than three dimensions in an OLAP
system the term hypercube is sometimes used.
[edit] Functionality
The OLAP cube consists of numeric facts called measures which are categorized by
dimensions. The cube metadata (structure) may be created from a star schema or
snowflake schema of tables in a relational database. Measures are derived from the
records in the fact table and dimensions are derived from the dimension tables.
[edit] Pivot
A financial analyst might want to view or "pivot" the data in various ways, such as
displaying all the cities down the page and all the products across a page. This could be
for a specified period, version and type of expenditure. Having seen the data in this
particular way the analyst might then immediately wish to view it in another way. The
cube could effectively be re-oriented so that the data displayed now has periods across
the page and type of cost down the page. Because this re-orientation involves re-
summarizing very large amounts of data, this new view of the data has to be generated
efficiently to avoid wasting the analyst's time, i.e. within seconds, rather than the hours a
relational database and conventional report-writer might have taken.[4]
[edit] Hierarchy
For example May 2005 could be summarized into Second Quarter 2005 which in turn
would be summarized in the Year 2005. Similarly the cities could be summarized into
regions, countries and then global regions; products could be summarized into larger
categories; and cost headings could be grouped into types of expenditure. Conversely the
analyst could start at a highly summarized level, such as the total difference between the
actual results and the budget, and drill down into the cube to discover which locations,
products and periods had produced this difference.
The analyst can understand the meaning contained in the databases using multi-
dimensional analysis. By aligning the data content with the analyst's mental model, the
chances of confusion and erroneous interpretations are reduced. The analyst can navigate
through the database and screen for a particular subset of the data, changing the data's
orientations and defining analytical calculations.[6] The user-initiated process of
navigating by calling for page displays interactively, through the specification of slices
via rotations and drill down/up is sometimes called "slice and dice". Common operations
include slice and dice, drill down, roll up, and pivot.
Dice: The dice operation is a slice on more than two dimensions of a data cube (or more
than two consecutive slices).[7]
Drill Down/Up: Drilling down or up is a specific analytical technique whereby the user
navigates among levels of data ranging from the most summarized (up) to the most
detailed (down).[6]
Roll-up: A roll-up involves computing all of the data relationships for one or more
dimensions. To do this, a computational relationship or formula might be defined.[6]
Pivot: This operation is also called rotate operation. It rotates the data in order to provide
an alternative presentation of data - the report or page display takes a different
dimensional orientation.[6]
The commercial OLAP products have different methods of creating and of linking cubes
and hypercubes (see Types of OLAP).
Linking cubes is a method of overcoming sparsity. Sparsity arises when not every cell in
the cube is filled with data and so valuable processing time is taken by effectively adding
up zeros. For example revenues may be available for each customer and product but cost
data may not be available with this amount of analysis. Instead of creating a sparse cube,
it is sometimes better to create another separate, but linked, cube in which a sub-set of the
data can be analyzed into great detail. The linking ensures that the data in the cubes
remain consistent.
The data in cubes may be updated at times, perhaps by different people. Techniques are
therefore often needed to lock parts of the cube while one of the users is writing to it and
to recalculate the cube's totals. Other facilities may allow an alert that shows previously
calculated totals are no longer valid after the new data have been added, but some
products only calculate the totals when they are needed.
W : (X,Y,Z) → W,
the attributes X, Y, and Z correspond to the axes of the cube, while the W value into which
each ( X, Y, Z ) triple maps corresponds to the data element that populates each cell of the
cube.
W : (X,Y) → W
which may suppress a primary key, but still have some semantic significance, perhaps a
slice of the triadic functional representation for a given Z value of interest.
The motivation[8] behind OLAP displays harks back to the cross-tabbed report paradigm
of 1980s DBMS. One may wish for a spreadsheet-style display, where values of X
populate row $1; values of Y populate column $A; and values of W : ( X, Y ) → W
populate the individual cells "southeast of" $B2, so to speak, $B2 itself included. While
one can certainly use the DML (Data Manipulation Language) of traditional SQL to
display ( X, Y, W ) triples, this output format is not nearly as convenient as the cross-
tabbed alternative: certainly, the former requires one to hunt linearly for a given ( X, Y )
pair in order to determine the corresponding W value, while the latter enables one to more
conveniently scan for the intersection of the proper X column with the proper Y row.