0% found this document useful (0 votes)
8 views21 pages

Unit-2 DM

Data warehousing involves creating an information system that consolidates historical data from various sources to aid decision-making. Key characteristics include being subject-oriented, integrated, time-variant, and non-volatile, with a focus on data modeling and analysis. The architecture typically follows a three-tier model and includes components such as ETL tools, metadata, and query tools to facilitate data access and management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

Unit-2 DM

Data warehousing involves creating an information system that consolidates historical data from various sources to aid decision-making. Key characteristics include being subject-oriented, integrated, time-variant, and non-volatile, with a focus on data modeling and analysis. The architecture typically follows a three-tier model and includes components such as ETL tools, metadata, and query tools to facilitate data access and management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

UNIT II

DATA WAREHOUSING

Data warehousing Components –Building a Data warehouse –- Multi Dimensional Data Model –
OLAP operations in Multi Dimensional Data model-Three Tier Data warehouse architecture-
Schemas for multi dimensional data model-Online Analytical processing(OLAP)- OLAP vs OLTP
Integrated OLAM and OLAP Architecture

Data Warehouse Concepts, Architecture and Components:

What is Data warehouse?

Data warehouse is an information system that contains historical and commutative data from single
or multiple sources. It simplifies reporting and analysis process of the organization. It is also a single
version of truth for any company for decision making and forecasting.

Characteristics of Data warehouse:

 Subject-Oriented
 Integrated
 Time-variant
 Non-volatile

Subject-Oriented:

A data warehouse is subject oriented as it offers information regarding a theme instead of companies’
ongoing operations. These subjects can be sales, marketing, distributions, etc.

A data warehouse never focuses on the ongoing operations. Instead, it put emphasis on modelling and
analysis of data for decision making. It also provides a simple and concise view around the specific
subject by excluding data which not helpful to support the decision process.

Integrated:

In Data Warehouse, integration means the establishment of a common unit of measure for all similar
data from the dissimilar database. The data also needs to be stored in the Data warehouse in common
and universally acceptable manner.

A data warehouse is developed by integrating data from varied sources like a mainframe, relational
databases, flat files, etc. Moreover, it must keep consistent naming conventions, format, and coding.

This integration helps in effective analysis of data. Consistency in naming conventions, attribute
measures, encoding structure etc. has to be ensured.

Time-Variant:

The time horizon for data warehouse is quite extensive compared with operational systems. The data
collected in a data warehouse is recognized with a particular period and offers information from the

K.BABU/CSE/ASST PROFESSOR Page 1


historical point of view. It contains an element of time, explicitly or implicitly. One such place where
Data warehouse data display time variance is in in the structure of the record key. Every primary key
contained with the DW should have either implicitly or explicitly an element of time. Like the day,
week month, etc. Another aspect of time variance is that once data is inserted in the warehouse, it
can't be updated or changed.

Non-volatile:

Data warehouse is also non-volatile means the previous data is not erased when new data is entered
in it. Data is read-only and periodically refreshed. This also helps to analyze historical data and
understand what & when happened. It does not require transaction process, recovery and
concurrency control mechanisms.

Activities like delete, update, and insert which are performed in an operational application
environment are omitted in Data warehouse environment. Only two types of data operations
performed in the Data Warehousing are

1. Data loading
2. Data access

Data Warehouse Architectures:

Single-tier architecture:

The objective of a single layer is to minimize the amount of data stored. This goal is to remove data
redundancy. This architecture is not frequently used in practice.

Two-tier architecture:

Two-layer architecture separates physically available sources and data warehouse. This architecture
is not expandable and also not supporting a large number of end-users. It also has connectivity
problems because of network limitations.

Three-tier architecture:

This is the most widely used architecture.

It consists of the Top, Middle and Bottom Tier.

1. Bottom Tier: The database of the Data warehouse servers as the bottom tier. It is usually a
relational database system. Data is cleansed, transformed, and loaded into this layer using
back-end tools.
2. Middle Tier: The middle tier in Data warehouse is an OLAP server which is implemented using
either ROLAP or MOLAP model. For a user, this application tier presents an abstracted view of
the database. This layer also acts as a mediator between the end-user and the database.
3. Top-Tier: The top tier is a front-end client layer. Top tier is the tools and API that you connect
and get data out from the data warehouse. It could be Query tools, reporting tools, managed
query tools, Analysis tools and Data mining tools.

K.BABU/CSE/ASST PROFESSOR Page 2


Data warehouse Components

The data warehouse is based on an RDBMS server which is a central information repository that is
surrounded by some key components to make the entire environment functional, manageable and
accessible

There are mainly five components of Data Warehouse:

Data Warehouse Database:

The central database is the foundation of the data warehousing environment. This database is
implemented on the RDBMS technology. Although, this kind of implementation is constrained by the
fact that traditional RDBMS system is optimized for transactional database processing and not for data
warehousing. For instance, ad-hoc query, multi-table joins, aggregates are resource intensive and slow
down performance.

Hence, alternative approaches to Database are used as listed below-

 In a data warehouse, relational databases are deployed in parallel to allow for scalability.
Parallel relational databases also allow shared memory or shared nothing model on various
multiprocessor configurations or massively parallel processors.
 New index structures are used to bypass relational table scan and improve speed.
 Use of multidimensional database (MDDBs) to overcome any limitations which are placed
because of the relational data model. Example: Essbase from Oracle.

Sourcing, Acquisition, Clean-up and Transformation Tools (ETL)

The data sourcing, transformation, and migration tools are used for performing all the conversions,
summarizations, and all the changes needed to transform data into a unified format in the data
warehouse. They are also called Extract, Transform and Load (ETL) Tools.

K.BABU/CSE/ASST PROFESSOR Page 3


Their functionality includes:

 Anonymize data as per regulatory stipulations.


 Eliminating unwanted data in operational databases from loading into Data warehouse.
 Search and replace common names and definitions for data arriving from different sources.
 Calculating summaries and derived data
 In case of missing data, populate them with defaults.
 De-duplicated repeated data arriving from multiple data sources.

These Extract, Transform, and Load tools may generate cron jobs, background jobs, Cobol programs,
shell scripts, etc. that regularly update data in data warehouse. These tools are also helpful to
maintain the Metadata.

These ETL Tools have to deal with challenges of Database & Data heterogeneity.

Metadata

The name Meta Data suggests some high- level technological concept. However, it is quite simple.
Metadata is data about data which defines the data warehouse. It is used for building, maintaining and
managing the data warehouse.

In the Data Warehouse Architecture, meta-data plays an important role as it specifies the source,
usage, values, and features of data warehouse data. It also defines how data can be changed and
processed. It is closely connected to the data warehouse.

Metadata helps to answer the following questions

 What tables, attributes, and keys does the Data Warehouse contain?
 Where did the data come from?
 How many times do data get reloaded?
 What transformations were applied with cleansing?

Metadata can be classified into following categories:

1. Technical Meta Data: This kind of Metadata contains information about warehouse which is
used by Data warehouse designers and administrators.
2. Business Meta Data: This kind of Metadata contains detail that gives end-users a way easy to
understand information stored in the data warehouse.

Query Tools

One of the primary objects of data warehousing is to provide information to businesses to make
strategic decisions. Query tools allow users to interact with the data warehouse system.

These tools fall into four different categories:

1. Query and reporting tools


2. Application Development tools

K.BABU/CSE/ASST PROFESSOR Page 4


3. Data mining tools
4. OLAP tools

1. Query and reporting tools:

Query and reporting tools can be further divided into

 Reporting tools
 Managed query tools

Reporting tools: Reporting tools can be further divided into production reporting tools and desktop
report writer.

1. Report writers: This kind of reporting tool is tools designed for end-users for their analysis.
2. Production reporting: This kind of tools allows organizations to generate regular operational
reports. It also supports high volume batch jobs like printing and calculating. Some popular
reporting tools are Brio, Business Objects, Oracle, Power Soft, SAS Institute.

Managed query tools:

This kind of access tools helps end users to resolve snags in database and SQL and database structure
by inserting meta-layer between users and database.

2. Application development tools:

Sometimes built-in graphical and analytical tools do not satisfy the analytical needs of an organization.
In such cases, custom reports are developed using Application development tools.

3. Data mining tools:

Data mining is a process of discovering meaningful new correlation, pattens, and trends by mining
large amount data. Data mining tools are used to make this process automatic.

4. OLAP tools:

These tools are based on concepts of a multidimensional database. It allows users to analyse the data
using elaborate and complex multidimensional views.

Data warehouse Bus Architecture

Data warehouse Bus determines the flow of data in your warehouse. The data flow in a data
warehouse can be categorized as Inflow, Upflow, Downflow, Outflow and Meta flow.

While designing a Data Bus, one needs to consider the shared dimensions, facts across data marts.

K.BABU/CSE/ASST PROFESSOR Page 5


Data Marts

A data mart is an access layer which is used to get data out to the users. It is presented as an option for
large size data warehouse as it takes less time and money to build. However, there is no standard
definition of a data mart is differing from person to person.

In a simple word Data mart is a subsidiary of a data warehouse. The data mart is used for partition of
data which is created for the specific group of users.

Data marts could be created in the same database as the Data warehouse or a physically separate
Database.

Data warehouse Architecture Best Practices

To design Data Warehouse Architecture, you need to follow below given best practices:

 Use a data model which is optimized for information retrieval which can be the dimensional
mode, denormalized or hybrid approach.
 Need to assure that Data is processed quickly and accurately. At the same time, you should take
an approach which consolidates data into a single version of the truth.
 Carefully design the data acquisition and cleansing process for Data warehouse.
 Design a MetaData architecture which allows sharing of metadata between components of Data
Warehouse
 Consider implementing an ODS model when information retrieval need is near the bottom of
the data abstraction pyramid or when there are multiple operational sources required to be
accessed.
 One should make sure that the data model is integrated and not just consolidated. In that case,
you should consider 3NF data model. It is also ideal for acquiring ETL and Data cleansing tools

Summary:

 Data warehouse is an information system that contains historical and commutative data from
single or multiple sources.
 A data warehouse is subject oriented as it offers information regarding subject instead of
organization's ongoing operations.
 In Data Warehouse, integration means the establishment of a common unit of measure for all
similar data from the different databases
 Data warehouse is also non-volatile means the previous data is not erased when new data is
entered in it.
 A Data warehouse is Time-variant as the data in a DW has high shelf life.
 There are 5 main components of a Data warehouse. 1) Database 2) ETL Tools 3) Meta Data 4)
Query Tools 5) Data Marts
 These are four main categories of query tools 1. Query and reporting, tools 2. Application
Development tools, 3. Data mining tools 4. OLAP tools
 The data sourcing, transformation, and migration tools are used for performing all the
conversions and summarizations.
 In the Data Warehouse Architecture, meta-data plays an important role as it specifies the
source, usage, values, and features of data warehouse data.
K.BABU/CSE/ASST PROFESSOR Page 6
Building a Data Warehouse:
In general, building any data warehouse consists of the following steps:

1. Extracting the transactional data from the data sources into a staging area

2. Transforming the transactional data

3. Loading the transformed data into a dimensional database

4. Building pre-calculated summary values to speed up report generation

5. Building (or purchasing) a front-end reporting tool

DIAGRAM FOR BUILDING A DATA WAREHOUSE

Extracting Transactional Data:


A large part of building a DW is pulling data from various data sources and placing it in a central
storage area. In fact, this can be the most difficult step to accomplish due to the reasons mentioned
earlier: Most people who worked on the systems in place have moved on to other jobs. Even if they
haven't left the company, you still have a lot of work to do: You need to figure out which database
system to use for your staging area and how to pull data from various sources into that area.

Fortunately for many small to mid-size companies, Microsoft has come up with an excellent tool for
data extraction. Data Transformation Services (DTS), which is part of Microsoft SQL Server 7.0 and
2000, allows you to import and export data from any OLE DB or ODBC-compliant database as long as
you have an appropriate provider. This tool is available at no extra cost when you purchase Microsoft
SQL Server. The sad reality is that you won't always have an OLE DB or ODBC-compliant data source
to work with, however. If not, you're bound to make a considerable investment of time and effort in
writing a custom program that transfers data from the original source into the staging database.

K.BABU/CSE/ASST PROFESSOR Page 7


Transforming Transactional Data:
An equally important and challenging step after extracting is transforming and relating the
data extracted from multiple sources. As I said earlier, your source systems were most likely built by
many different IT professionals. Let's face it. Each person sees the world through their own eyes, so
each solution is at least a bit different from the others. The data model of your mainframe system
might be very different from the model of the client-server system.

Most companies have their data spread out in a number of various database management systems: MS
Access, MS SQL Server, Oracle, Sybase, and so on. Many companies will also have much of their data in
flat files, spreadsheets, mail systems and other types of data stores. When building a data warehouse,
you need to relate data from all of these sources and build some type of a staging area that can handle
data extracted from any of these source systems. After all the data is in the staging area, you have to
massage it and give it a common shape. Prior to massaging data, you need to figure out a way to relate
tables and columns of one system to the tables and columns coming from the other systems.

Creating a Dimensional Model:


The third step in building a data warehouse is coming up with a dimensional model. Most modern
transactional systems are built using the relational model. The relational database is highly
normalized; when designing such a system, you try to get rid of repeating columns and make all
columns dependent on the primary key of each table. The relational systems perform well in the On-
Line Transaction Processing (OLTP) environment. On the other hand, they perform rather poorly in
the reporting (and especially DW) environment, in which joining multiple huge tables just is not the
best idea.

The relational format is not very efficient when it comes to building reports with summary and
aggregate values. The dimensional approach, on the other hand, provides a way to improve query
performance without affecting data integrity. However, the query performance improvement comes
with a storage space penalty; a dimensional database will generally take up much more space than its
relational counterpart. These days, storage space is fairly inexpensive, and most companies can afford
large hard disks with a minimal effort.

The dimensional model consists of the fact and dimension tables. The fact tables consist of foreign
keys to each dimension table, as well as measures. The measures are a factual representation of how
well (or how poorly) your business is doing (for instance, the number of parts produced per hour or
the number of cars rented per day). Dimensions, on the other hand, are what your business users
expect in the reports—the details about the measures. For example, the time dimension tells the user
that 2000 parts were produced between 7 a.m. and 7 p.m. on the specific day; the plant dimension
specifies that these parts were produced by the Northern plant.

Just like any modeling exercise the dimensional modeling is not to be taken lightly. Figuring out the
needed dimensions is a matter of discussing the business requirements with your users over and over
again. When you first talk to the users they have very minimal requirements: "Just give me those
reports that show me how each portion of the company performs." Figuring out what "each portion of
the company" means is your job as a DW architect. The company may consist of regions, each of which
report to a different vice president of operations. Each region, on the other hand, might consist of
areas, which in turn might consist of individual stores. Each store could have several departments.
When the DW is complete, splitting the revenue among the regions won't be enough. That's when
your users will demand more features and additional drill-down capabilities. Instead of waiting for
K.BABU/CSE/ASST PROFESSOR Page 8
that to happen, an architect should take proactive measures to get all the necessary requirements
ahead of time.

It's also important to realize that not every field you import from each data source may fit into the
dimensional model. Indeed, if you have a sequential key on a mainframe system, it won't have much
meaning to your business users. Other columns might have had significance eons ago when the system
was built. Since then, the management might have changed its mind about the relevance of such
columns. So don't worry if all of the columns you imported are not part of your dimensional model.

Loading the Data:


After you've built a dimensional model, it's time to populate it with the data in the staging database.
This step only sounds trivial. It might involve combining several columns together or splitting one
field into several columns. You might have to perform several lookups before calculating certain
values for your dimensional model.

Keep in mind that such data transformations can be performed at either of the two stages: while
extracting the data from their origins or while loading data into the dimensional model. I wouldn't
recommend one way over the other—make a decision depending on the project. If your users need to
be sure that they can extract all the data first, wait until all data is extracted prior to transforming it. If
the dimensions are known prior to extraction, go on and transform the data while extracting it.

Generating Precalculated Summary Values:


The next step is generating the precalculated summary values which are commonly referred to
as aggregations. This step has been tremendously simplified by SQL Server Analysis Services (or OLAP
Services, as it is referred to in SQL Server 7.0). After you have populated your dimensional database,
SQL Server Analysis Services does all the aggregate generation work for you. However, remember that
depending on the number of dimensions you have in your DW, building aggregations can take a long
time. As a rule of thumb, the more dimensions you have, the more time it'll take to build aggregations.
However, the size of each dimension also plays a significant role.

Prior to generating aggregations, you need to make an important choice about which dimensional
model to use: ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), or HOLAP (Hybrid OLAP).
The ROLAP model builds additional tables for storing the aggregates, but this takes much more
storage space than a dimensional database, so be careful! The MOLAP model stores the aggregations
as well as the data in multidimensional format, which is far more efficient than ROLAP. The HOLAP
approach keeps the data in the relational format, but builds aggregations in multidimensional format,
so it's a combination of ROLAP and MOLAP.

Regardless of which dimensional model you choose, ensure that SQL Server has as much memory as
possible. Building aggregations is a memory-intensive operation, and the more memory you provide,
the less time it will take to build aggregate values.

Building (or Purchasing) a Front-End Reporting Tool


After you've built the dimensional database and the aggregations you can decide how sophisticated
your reporting tools need to be. If you just need the drill-down capabilities, and your users have
Microsoft Office 2000 on their desktops, the Pivot Table Service of Microsoft Excel 2000 will do the
job. If the reporting needs are more than what Excel can offer, you'll have to investigate the alternative
of building or purchasing a reporting tool. The cost of building a custom reporting (and OLAP) tool

K.BABU/CSE/ASST PROFESSOR Page 9


will usually outweigh the purchase price of a third-party tool. That is not to say that OLAP tools are
cheap .

There are several major vendors on the market that have top-notch analytical tools. In addition to the
third-party tools, Microsoft has just released its own tool, Data Analyzer, which can be a cost-effective
alternative. Consider purchasing one of these suites before delving into the process of developing your
own software because reinventing the wheel is not always beneficial or affordable. Building OLAP
tools is not a trivial exercise by any means.

Multidimensional Data Model:

Multidimensional data model stores data in the form of data cube.Mostly, data warehousing supports
two or three-dimensional cubes.

A data cube allows data to be viewed in multiple dimensions.A dimensions are entities with respect to
which an organization wants to keep records.For example in store sales record, dimensions allow the
store to keep track of things like monthly sales of items and the branches and locations.A
multidimensional databases helps to provide data-related answers to complex business queries
quickly and accurately.Data warehouses and Online Analytical Processing (OLAP) tools are based on a
multidimensional data model.OLAP in data warehousing enables users to view data from different
angles and dimensions

Schema:

Schema is a logical description of the entire database. It includes the name and description of records
of all record types including all associated data-items and aggregates. Much like a database, a data
warehouse also requires to maintain a schema. A database uses relational model, while a data
warehouse uses Star, Snowflake, and Fact Constellation schema. In this chapter, we will discuss the
schemas used in a data warehouse.
Star Schema
 Each dimension in a star schema is represented with only one-dimension table.
 This dimension table contains the set of attributes.
 The following diagram shows the sales data of a company with respect to the four dimensions,
namely time, item, branch, and location.

K.BABU/CSE/ASST PROFESSOR Page 10


 There is a fact table at the center. It contains the keys to each of four dimensions.
 The fact table also contains the attributes, namely dollars sold and units sold.
Note − Each dimension has only one dimension table and each table holds a set of attributes. For
example, the location dimension table contains the attribute set {location_key, street, city,
province_or_state,country}. This constraint may cause data redundancy. For example, "Vancouver"
and "Victoria" both the cities are in the Canadian province of British Columbia. The entries for such
cities may cause data redundancy along the attributesprovince_or_state and country.
Snowflake Schema
 Some dimension tables in the Snowflake schema are normalized.
 The normalization splits up the data into additional tables.
 Unlike Star schema, the dimensions table in a snowflake schema are normalized. For example,
the item dimension table in star schema is normalized and split into two dimension tables,
namely item and supplier table.

K.BABU/CSE/ASST PROFESSOR Page 11


 Now the item dimension table contains the attributes item_key, item_name, type, brand, and
supplier-key.
 The supplier key is linked to the supplier dimension table. The supplier dimension table
contains the attributes supplier_key and supplier_type.
Note − Due to normalization in the Snowflake schema, the redundancy is reduced and therefore, it
becomes easy to maintain and the save storage space.
Fact Constellation Schema
 A fact constellation has multiple fact tables. It is also known as galaxy schema.
 The following diagram shows two fact tables, namely sales and shipping.

 The sales fact table is same as that in the star schema.


 The shipping fact table has the five dimensions, namely item_key, time_key, shipper_key,
from_location, to_location.
 The shipping fact table also contains two measures, namely dollars sold and units sold.
 It is also possible to share dimension tables between fact tables. For example, time, item, and
location dimension tables are shared between the sales and shipping fact table.

Online Analytical Processing Server (OLAP):

Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It allows
managers, and analysts to get an insight of the information through fast, consistent, and interactive
access to information. This chapter cover the types of OLAP, operations on OLAP, difference between
OLAP, and statistical databases and OLTP.
Types of OLAP Servers
We have four types of OLAP servers −

 Relational OLAP (ROLAP)


 Multidimensional OLAP (MOLAP)

K.BABU/CSE/ASST PROFESSOR Page 12


 Hybrid OLAP (HOLAP)
 Specialized SQL Servers
Relational OLAP
ROLAP servers are placed between relational back-end server and client front-end tools. To store and
manage warehouse data, ROLAP uses relational or extended-relational DBMS.
ROLAP includes the following −

 Implementation of aggregation navigation logic.


 Optimization for each DBMS back end.
 Additional tools and services.
Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views of data. With
multidimensional data stores, the storage utilization may be low if the data set is sparse. Therefore,
many MOLAP server use two levels of data storage representation to handle dense and sparse data
sets.
Hybrid OLAP
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP and
faster computation of MOLAP. HOLAP servers allows to store the large data volumes of detailed
information. The aggregations are stored separately in MOLAP store.
Specialized SQL Servers
Specialized SQL servers provide advanced query language and query processing support for SQL
queries over star and snowflake schemas in a read-only environment.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP operations in
multidimensional data.
Here is the list of OLAP operations −

 Roll-up
 Drill-down
 Slice and dice
 Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −

 By climbing up a concept hierarchy for a dimension


 By dimension reduction
The following diagram illustrates how roll-up works.

K.BABU/CSE/ASST PROFESSOR Page 13


 Roll-up is performed by climbing up a concept hierarchy for the dimension location.
 Initially the concept hierarchy was "street < city < province < country".
 On rolling up, the data is aggregated by ascending the location hierarchy from the level of city
to the level of country.
 The data is grouped into cities rather than countries.
 When roll-up is performed, one or more dimensions from the data cube are removed.
Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the following ways −

 By stepping down a concept hierarchy for a dimension


 By introducing a new dimension.
The following diagram illustrates how drill-down works −

K.BABU/CSE/ASST PROFESSOR Page 14


 Drill-down is performed by stepping down a concept hierarchy for the dimension time.
 Initially the concept hierarchy was "day < month < quarter < year."
 On drilling down, the time dimension is descended from the level of quarter to the level of
month.
 When drill-down is performed, one or more dimensions from the data cube are added.
 It navigates the data from less detailed data to highly detailed data.
Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-cube.
Consider the following diagram that shows how slice works.

K.BABU/CSE/ASST PROFESSOR Page 15


 Here Slice is performed for the dimension "time" using the criterion time = "Q1".
 It will form a new sub-cube by selecting one or more dimensions.
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider the
following diagram that shows the dice operation.

K.BABU/CSE/ASST PROFESSOR Page 16


The dice operation on the cube based on the following selection criteria involves three dimensions.

 (location = "Toronto" or "Vancouver")


 (time = "Q1" or "Q2")
 (item =" Mobile" or "Modem")
Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to provide an
alternative presentation of data. Consider the following diagram that shows the pivot operation.

Three-Tier Data Warehouse Architecture


Generally a data warehouses adopts three-tier architecture. Following are the three tiers of the data
warehouse architecture.
These 3 tiers are:

1. Bottom Tier (Data warehouse server)


2. Middle Tier (OLAP server)
3. Top Tier (Front end tools)

 Bottom Tier − The bottom tier of the architecture is the data warehouse database server. It is
the relational database system. We use the back endtools and utilities to feed data into the
bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh
functions.
 Middle Tier − In the middle tier, we have the OLAP Server that can be implemented in either
of the following ways.
K.BABU/CSE/ASST PROFESSOR Page 17
o By Relational OLAP (ROLAP), which is an extended relational database management
system? The ROLAP maps the operations on multidimensional data to standard
relational operations.
o By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations?
 Top-Tier − This tier is the front-end client layer. This layer holds the query tools and
reporting tools, analysis tools and data mining tools.
The following diagram depicts the three-tier architecture of data warehouse −

Data Warehouse Models


From the perspective of data warehouse architecture, we have the following data warehouse models

 Virtual Warehouse
 Data mart
 Enterprise Warehouse
Virtual Warehouse
The view over an operational data warehouse is known as a virtual warehouse. It is easy to build a
virtual warehouse. Building a virtual warehouse requires excess capacity on operational database
servers.
Data Mart
Data mart contains a subset of organization-wide data. This subset of data is valuable to specific
groups of an organization.

K.BABU/CSE/ASST PROFESSOR Page 18


In other words, we can claim that data marts contain data specific to a particular group. For example,
the marketing data mart may contain data related to items, customers, and sales. Data marts are
confined to subjects.
Points to remember about data marts −
 Window-based or Unix/Linux-based servers are used to implement data marts. They are
implemented on low-cost servers.
 The implementation data mart cycles is measured in short periods of time, i.e., in weeks rather
than months or years.
 The life cycle of a data mart may be complex in long run, if its planning and design are not
organization-wide.
 Data marts are small in size.
 Data marts are customized by department.
 The source of a data mart is departmentally structured data warehouse.
 Data mart are flexible.
Enterprise Warehouse
 An enterprise warehouse collects all the information and the subjects spanning an entire
organization
 It provides us enterprise-wide data integration.
 The data is integrated from operational systems and external information providers.
 This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or beyond.

Features of OLTP and OLAP:

The major distinguishing features between OLTP and OLAP are summarized as follows.

1. Users and system orientation: An OLTP system is customer-oriented and is used for transaction
and query processing by clerks, clients, and information technology professionals. An OLAP system is
market-oriented and is used for data analysis by knowledge workers, including managers, executives,
and analysts.

2. Data contents: An OLTP system manages current data that, typically, are too detailed to be easily
used for decision making. An OLAP system manages large amounts of historical data, provides
facilities for summarization and aggregation, and stores and manages information at different levels of
granularity. These features make the data easier for use in informed decision making.

3. Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an
application oriented database design. An OLAP system typically adopts either a star or snowflake
model and a subject-oriented database design.

4. View: An OLTP system focuses mainly on the current data within an enterprise or department,
without referring to historical data or data in different organizations. In contrast, an OLAP system
often spans multiple versions of a database schema. OLAP systems also deal with information that
K.BABU/CSE/ASST PROFESSOR Page 19
originates from different organizations, integrating information from many data stores. Because of
their huge volume, OLAP data are stored on multiple storage media.

5. Access patterns: The access patterns of an OLTP system consist mainly of short, atomic
transactions. Such a system requires concurrency control and recovery mechanisms. However,
accesses to OLAP systems are mostly read-only operations although many could be complex queries.
Comparison between OLTP and OLAP systems.

Integrated OLAP and OLAM Architecture:

Integrated OLAP and OLAM Architecture Online Analytical Mining integrates with Online Analytical
Processing with data mining and mining knowledge in multidimensional databases. Here is the
diagram that shows the integration of both OLAP and OLAM

Importance of OLAM
OLAM is important for the following reasons −
 High quality of data in data warehouses − The data mining tools are required to work on
integrated, consistent, and cleaned data. These steps are very costly in the preprocessing of
data. The data warehouses constructed by such preprocessing are valuable sources of high
quality data for OLAP and data mining as well.
 Available information processing infrastructure surrounding data warehouses −
Information processing infrastructure refers to accessing, integration, consolidation, and

K.BABU/CSE/ASST PROFESSOR Page 20


transformation of multiple heterogeneous databases, web-accessing and service facilities,
reporting and OLAP analysis tools.
 OLAP−based exploratory data analysis − Exploratory data analysis is required for effective
data mining. OLAM provides facility for data mining on various subset of data and at different
levels of abstraction.
 Online selection of data mining functions − Integrating OLAP with multiple data mining
functions and online analytical mining provide users with the flexibility to select desired data
mining functions and swap data mining tasks dynamically.

K.BABU/CSE/ASST PROFESSOR Page 21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy