0% found this document useful (0 votes)
16 views33 pages

Data Warehouse Unit-I

The document provides an overview of Data Warehousing (DW), detailing its architecture, types, components, and implementation steps. It highlights the advantages and disadvantages of DW, the need for various user types, and the tools available for data warehousing. Additionally, it discusses the principles of data warehousing, including load performance, query performance, and data quality management.

Uploaded by

Saritha Sajeesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views33 pages

Data Warehouse Unit-I

The document provides an overview of Data Warehousing (DW), detailing its architecture, types, components, and implementation steps. It highlights the advantages and disadvantages of DW, the need for various user types, and the tools available for data warehousing. Additionally, it discusses the principles of data warehousing, including load performance, query performance, and data quality management.

Uploaded by

Saritha Sajeesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Module III

4
BI USING DATA WAREHOUSING
4.1 Introduction to DW
4.2 DW architecture
4.3 ETL Process
4.4 Data Warehouse Design

4.1 INTRODUCTION TO DW
Data Warehouse (DW) is maintained separately from the organization’s
operational database and is an environment. Its architectural construct
provides users with current and historical decision support information
which is not possible in the present traditional operational data store. DW
provides a new design which helps in reduced response time and enhance
the performance of queries for reports and analytics.
Data warehouse system is also known by the following name:
❖ Decision Support System (DSS)
❖ Executive Information System
❖ Management Information System
❖ Business Intelligence Solution
❖ Analytic Application
❖ Data Warehouse

Fig 1: Data warehouse System

50
History of Datawarehouse Bi Using Data Warehousing
The need to warehouse is to handle increasing amounts of Information.

How Datawarehouse works?


DW works as a central repository from one or more data sources, data
flows into a DW from the transactional system and other relational
databases.
Data may be:
1. Structured
2. Semi-structured
3. Unstructured data

Types of Data Warehouse


Three main types of Data Warehouses (DWH) are:
1. Enterprise Data Warehouse (EDW):
Enterprise Data Warehouse (EDW) is a centralized warehouse provides
decision support service across the enterprise for organizing and
representing data.

2. Operational Data Store:


Operational Data Store (ODS) is a data store required when neither DW
nor OLTP systems support organizations reporting needs and is preferred
for routine activities like storing records of the Employees.

3. Data Mart:
Data mart a subset of the DW is designed for a particular line of business
like sales, finance and etc.

General stages of Data Warehouse


❖ Offline Operational Database:
❖ Offline Data Warehouse:
❖ Real time Data Warehouse:
❖ Integrated Data Warehouse:

Components of Data warehouse


Four components of Data Warehouses are:
❖ Load manager
❖ Warehouse Manager

51
Data Mining and Business ❖ Query Manager
Intelligence
❖ End-user access tools:
This is categorized into five different groups like 1. Data Reporting 2.
Query Tools 3. Application development tools 4. EIS tools, 5. OLAP tools
and data mining tools.

Who needs Data warehouse?


DWH (Data warehouse) is needed for all types of users like:
❖ Decision makers
❖ Users who use customized, complex processes to obtain information
from multiple data sources.
❖ Technology Savvy
❖ Making decisions.
❖ Fast performance on a huge amount of data.
❖ Hidden patterns.

Data Warehouse is used for?


❖ Airline
❖ Banking
❖ Healthcare
❖ Public sector
❖ Investment and Insurance sector
❖ Retain chain
❖ Telecommunication
❖ Hospitality Industry

Steps to Implement Data Warehouse


1. Enterprise strategy
2. Phased delivery
3. Iterative Prototyping
Here, are key steps in Data warehouse implementation along with its
deliverables.

52
Bi Using Data Warehousing

Fig 2: Data Warehouse implementation


Advantages of Data Warehouse (DWH) [1-15]:
● DW allows business users to quickly access critical data.
● DW provides consistent information on various cross-functional
activities and supports ad-hoc reporting and query.
● DW helps to integrate many sources of data.
● DW helps to reduce total turnaround time for analysis and reporting.
● Restructuring and Integration make it easier for the user to use for
reporting and analysis.
● DW allows users to access critical data.
● DW analyzes different time periods and trends to make future
predictions.

Disadvantages of Data Warehouse:


❖ Not an ideal option for unstructured data.
❖ Creating and Implementation of DW is a time taking process.
❖ Difficult to make changes in data types and ranges, data source
schema, indexes, and queries.
❖ Spend lots of their resources for training and Implementation
purpose.

53
Data Mining and Business The Future of Data Warehousing
Intelligence
❖ Change in Regulatory constrains.
❖ Size of the database.
❖ Multimedia data.

Data Warehouse Tools


1. MarkLogic:
https://www.marklogic.com/product/getting-started/

2. Oracle:
https://www.oracle.com/index.html

3. Amazon RedShift:
https://aws.amazon.com/redshift/?nc2=h_m1
Here is a complete list of useful Datawarehouse Tools.

Sl.No DWM Tool Platform Features

1 CData Sync Windows, Automated intelligent


Mac, Linux, incremental data replication
Cloud
Fully customizable ETL/ELT
data transformation
Runs anywhere – On-premise or
in the Cloud

2 Integrate.io Cloud Integrate.io connects to all major


E-commerce providers such as
Shopify, NetSuite,
BigCommerce, and Magento.
Meet all compliance
requirements with security
features like: field-level data
encryption, SOC II certification,
GDPR compliance, and data
masking.

54
3 QuerySurge Windows, It speeds up testing process up to Bi Using Data Warehousing
Linux 1,000 x and also providing up to
100% data coverage
It integrates an out-of-the-box
DevOps solution for most Build,
ETL & QA management
software.

4 Astera DW Windows Automate ETL operations


Builder through job scheduling and
workflow automation.

5 MS SSIS Windows Tightly integrated with Microsoft


Visual Studio and SQL Server
SSIS consumes data which are
difficult like FTP, HTTP,
MSMQ, and Analysis services,
etc.
Data can be loaded in parallel to
many varied destinations

Characteristics of Data warehouse


Data Warehouse Concepts have following characteristics:
❖ Subject-Oriented
❖ Integrated
❖ Time-variant
❖ Non-volatile

Fig 3: Data Integration Issues

55
Data Mining and Business 4.2 DW ARCHITECTURE
Intelligence
Business Analysis Framework
The business analyst get the information from the data warehouses to
measure the performance and make critical adjustments in order to win
over other business holders in the market. Has the following advantages −
❖ Can enhance business productivity.
❖ Helps us manage customer relationship.
❖ Brings down the costs by tracking trends, patterns over a long period
in a consistent and reliable manner.
To design an effective and efficient data warehouse, we need to
understand and analyze the business needs and construct a business
analysis framework. Views are as follows:
❖ The top-down view
❖ The data source view
❖ The data warehouse.
❖ The business query view
Data warehouses and their architectures very depending upon the elements
of an organization's situation and are classified as:
❖ Data Warehouse Architecture: Basic
❖ Data Warehouse Architecture: With Staging Area
❖ Data Warehouse Architecture: With Staging Area and Data Marts

Data Warehouse Architecture: Basic

Fig 4: Data Warehouse Architecture - Basic


56
Data Warehouse Architecture: With Staging Area Bi Using Data Warehousing
We must clean and process your operational information before put it into
the warehouse.

Fig 5: Data Warehouse Architecture with a staging area


*** Data Warehouse Staging Area is a temporary location where a
record from source systems is copied.

Fig 6: Data Warehouse Architecture with Staging Area and Data Marts (a)

The figure 6 illustrates an example where purchasing, sales, and stocks are
separated. In this example, a financial analyst wants to analyze historical
data for purchases and sales or mine historical information to make
predictions about customer behavior.

Fig 7: Architecture of Date warehouse with staging area and data


marts (b)
57
Data Mining and Business Properties of Data Warehouse Architectures
Intelligence
The following architecture properties are necessary for a data warehouse
system:
1. Separation
2. Scalability
3. Extensibility
4. Security
5. Administerability

DATA WAREHOUSE ARCHITECTURES


Single-Tier Architecture
Single-Tier architecture is not periodically used in practice. Its purpose is
to minimize the amount of data stored to reach this goal; it removes data
redundancies.
The figure shows the only layer physically available is the source layer. In
this method, data warehouses are virtual. This means that the data
warehouse is implemented as a multidimensional view of operational data
created by specific middleware, or an intermediate processing layer.

Fig 8: Single Tier Data warehouse Architecture


The vulnerability of this architecture lies in its failure to meet the
requirement for separation between analytical and transactional
processing. Analysis queries are agreed to operational data after the
middleware interprets them. In this way, queries affect transactional
workloads.

Two-Tier Architecture
The requirement for separation plays an essential role in defining the two-
tier architecture for a data warehouse system, as shown in fig:

58
Bi Using Data Warehousing

Fig 9: Two Tier Data warehouse Architecture


Although it is typically called two-layer architecture to highlight a
separation between physically available sources and data warehouses, in
fact, consist of four subsequent data flow stages:
1. Source layer: A data warehouse system uses a heterogeneous source
of data..
2. Data Staging: The data stored to the source should be extracted,
cleansed to remove inconsistencies and fill gaps, and integrated to
merge heterogeneous sources into one standard schema. The so-
named Extraction, Transformation, and Loading Tools (ETL) can
combine heterogeneous schemata, extract, transform, cleanse, validate,
filter, and load source data into a data warehouse.
3. Data Warehouse layer: Information is saved to one centralized
repository: a data warehouse. The data warehouses can be directly
accessed, but it can also be used as a source for creating data marts,
which partially replicate data warehouse contents and are designed for
specific enterprise departments.
4. Analysis: Integrated data is efficiently, and flexible accessed to issue
reports, dynamically analyze information, and simulate hypothetical
business scenarios.

Three-Tier Architecture
The three-tier architecture consists of the source layer (containing multiple
source system), the reconciled layer and the data warehouse layer
(containing both data warehouses and data marts). The reconciled layer
sits between the source data and data warehouse.
The main advantage of the reconciled layer is that it creates a standard
reference data model for a whole enterprise. At the same time, it separates
the problems of source data extraction and integration from those of data
warehouse population.
59
Data Mining and Business
Intelligence

Fig 10: Three Tier Data warehouse Architecture


Three-Tier Data Warehouse Architecture
Data Warehouses usually have a three-level (tier) architecture that
includes:
1. Bottom Tier (Data Warehouse Server)
2. Middle Tier (OLAP Server)
3. Top Tier (Front end Tools).
A bottom-tier that consists of the Data Warehouse server, which is
almost always an RDBMS. It may include several specialized data marts
and a metadata repository.
Data from operational databases and external sources (such as user profile
data provided by external consultants) are extracted using application
program interfaces called a gateway. A gateway is provided by the
underlying DBMS and allows customer programs to generate SQL code to
be executed at a server.
Examples of gateways contain ODBC (Open Database Connection)
and OLE-DB (Open-Linking and Embedding for Databases), by
Microsoft, and JDBC (Java Database Connection).
A middle-tier which consists of an OLAP server for fast querying of the
data warehouse.
The OLAP server is implemented using either
(1) A Relational OLAP (ROLAP) model, i.e., an extended relational
DBMS that maps functions on multidimensional data to standard
relational operations.
(2) A Multidimensional OLAP (MOLAP) model, i.e., a particular
purpose server that directly implements multidimensional information and
operations.
60
A top-tier that contains front-end tools for displaying results provided by Bi Using Data Warehousing
OLAP, as well as additional tools for data mining of the OLAP-generated
data.
The overall Data Warehouse Architecture is shown in Fig 11.

Fig 11: Overall Data warehouse Architecture


The metadata repository stores information that defines DW objects. It
includes the following parameters and information for the middle and the
top-tier applications:
1. A description of the DW structure, including the warehouse schema,
dimension, hierarchies, data mart locations, and contents, etc.
2. Operational metadata, which usually describes the currency level of
the stored data.
3. System performance data, which includes indices, used to improve
data access and retrieval performance.
4. Summarization algorithms

Principles of Data Warehousing

Fig 12: Principles of Data warehousing

61
Data Mining and Business Load Performance
Intelligence
Data warehouses require increase loading of new data periodically basis
within less amount of time; performance on the load process should be
measured in hundreds of millions of rows and gigabytes per hour and must
not artificially constrain the volume of data business.

Load Processing
Many phases must be taken to load new or update data into the data
warehouse, including data conversion, filtering, reformatting, indexing,
and metadata update.

Data Quality Management


Fact-based management demands the highest data quality.

Query Performance
Fact-based management must not be slowed by the performance of the
data warehouse RDBMS; large, complex queries must be complete in
seconds.

Terabyte Scalability
Data warehouse sizes are growing at enormous rates. Today these size
from a few to 100 of GBs and TB-sized DW.
Types of Data Warehouses
There are different types of data warehouses, which are as follows:

Fig 13: Types of Data warehouses

62
Host-Based Data Warehouses Bi Using Data Warehousing
There are two types of host-based data warehouses which can be
implemented:
❖ Host-Based mainframe warehouses which reside on a high volume
database. Supported by robust and reliable high capacity structure such as
IBM system/390, UNISYS and Data General sequent systems, and
databases such as Sybase, Oracle, Informix, and DB2.
❖ Host-Based LAN data warehouses, where data delivery can be
handled either centrally or from the workgroup environment. The size of
the data warehouses of the database depends on the platform.
Data Extraction and transformation tools allow the automated extraction
and cleaning of data from production systems.
1. A huge load of complex warehousing queries would possibly have too
much of a harmful impact upon the mission-critical transaction
processing (TP)-oriented application.
2. These transaction processing systems have been developing in their
database design for transaction throughput.
3. There is no assurance that data remains consistent.
Host-Based (MVS) Data Warehouses
Those data warehouse uses that reside on large volume databases on MVS
are the host-based types of data warehouses. Often the DBMS is DB2 with
a huge variety of original source for legacy information like VSAM, DB2,
flat files, and Information Management System (IMS). of Java

Fig 14: Host based (MVS) Data warehouse


Before embarking on designing, building and implementing such a
warehouse, some further considerations must be specified as
1. Databases have very high volumes of data storage.
2. Warehouses may require support for both MVS and customer-based
report and query facilities.
63
Data Mining and Business 3. DW has complicated source systems.
Intelligence
4. Needs continuous maintenance.
To make such data warehouses building successful, the following phases
are generally followed:
❖ Unload Phase
❖ Transform Phase
❖ Load Phase
An integrated Metadata repository is central to any data warehouse
environment. It provides a dynamic network between the multiple data
source databases and the DB2 of the conditional data warehouses.
A metadata repository is necessary to design, build, and maintain data
warehouse processes. It should be capable of providing data as to what
data exists in both the operational system and data warehouse, where the
data is located. Query, reporting, and maintenance are another
indispensable method of such a data warehouse. An MVS-based query and
reporting tool for DB2.
Host-Based (UNIX) Data Warehouses
Oracle and Informix RDBMSs support the facilities for such data
warehouses. Both of these databases can extract information from MVS¬
based databases as well as a higher number of other UNIX¬ based
databases.
LAN-Based Workgroup Data Warehouses
A LAN based workgroup warehouse is an integrated structure for building
and maintaining a data warehouse in a LAN environment. We can extract
information from a variety of sources and support multiple LAN based
warehouses, generally chosen warehouse databases to include DB2 family,
Oracle, Sybase, and Informix. Other databases that can also be contained
through infrequently are IMS, VSAM, Flat File, MVS, and VH.

Fig 15: LAN based Work Group Warehouse


64
Designed for the workgroup environment, a LAN based workgroup Bi Using Data Warehousing
warehouse is optimal for any business organization that wants to build a
data warehouse often called a data mart.
Data Delivery: A LAN based workgroup warehouse, customer needs
minimal technical knowledge to create and maintain a store of data that
customized for use at the department, business unit, or workgroup level
and ensures the delivery of information from corporate resources by
providing transport access to the data in the warehouse.

Host-Based Single Stage (LAN) Data Warehouses


Within a LAN based data warehouse, data delivery can be handled either
centrally or from the workgroup environment so business groups can meet
process their data needed without burdening centralized IT resources,
enjoying the autonomy of their data mart without comprising overall data
integrity and security in the enterprise.

Fig 16: LAN based Single Stage Warehouse


A LAN based warehouse provides data from many sources requiring a
minimal initial investment and technical knowledge. A LAN based
warehouse can also work replication tools for populating and updating the
data warehouse. This type of warehouse can include business views,
histories, aggregation, versions in, and heterogeneous source support, such
as
❖ DB2 Family
❖ IMS, VSAM, Flat File [MVS and VM]
A single store frequently drives a LAN based warehouse and provides
existing DSS applications, enabling the business user to locate data in their
data warehouse. The LAN based warehouse can support business users
with complete data to information solution.

Multi-Stage Data Warehouses


Is well suitable to environments where end-clients in numerous capacities
require access to both summarized information for up to the minute
65
Data Mining and Business tactical decisions as well as summarized, a commutative record for long-
Intelligence term strategic decisions. Both the Operational Data Store (ODS) and the
data warehouse may reside on host-based or LAN Based databases,
depending on volume and custom requirements. These contain DB2,
Oracle, Informix, IMS, Flat Files, and Sybase.

Fig 17: Multistage Data Warehouse


Stationary Data Warehouses
In this type of data warehouses, the data is not changed from the sources,
as depcited in fig:

Fig 18: Stationary data Warehouse


The customer is given direct access to the data. Problems generated by this
schema are:
❖ Identifying the location of the information for the users
❖ Providing clients the ability to query different DBMSs as is they were
all a single DBMS with a single API.

66
❖ Impacting performance since the customer will be competing with the Bi Using Data Warehousing
production data stores.

Distributed Data Warehouses


The concept of a distributed data warehouse suggests that there are two
types of distributed data warehouses and their modifications for the local
enterprise warehouses which are distributed throughout the enterprise and
global warehouses as shown in fig:

Fig 19: Distributed Data Warehouse


Characteristics of Local data warehouses
❖ Has its unique architecture and contents of data
❖ The data is unique
❖ Majority of the record is local and not replicated
❖ Any intersection of data between local data warehouses is
circumstantial
❖ Local warehouse serves different technical communities

Virtual Data Warehouses


Virtual Data Warehouses is created in the following stages:
1. Installing a set of data approach, data dictionary, and process
management facilities.
2. Training end-clients.
3. Monitoring how DW facilities will be used
This strategy provides ultimate flexibility as well as the minimum amount
of redundant information that must be loaded and maintained and is
termed the 'virtual data warehouse.'
67
Data Mining and Business To accomplish this, there is a need to define four kinds of data:
Intelligence
1. A data dictionary including the definitions of the various databases.
2. A description of the relationship between the data components.
3. The description of the method user will interface with the system.
4. The algorithms and business rules that describe what to do and how to
do it.

Disadvantages
1. Queries competing with production record transactions can degrade
the performance.
2. No metadata, no summary record, or no individual DSS (Decision
Support System) integration or history.
3. No refreshing process, causing the queries to be very complex.

Fig 20: Virtual data Warehouse

4.3 ETL PROCESS


ETL is a process that extracts the data from different source systems,
transforming the data and finally loads the data into the Data Warehouse
system. Extract, Transform and Load (ETL).

ETL Process in Data Warehouses


ETL is a 3-step process

68
Bi Using Data Warehousing

Fig 21: ETL Process in Data Warehouse


Step 1) Extraction
Transformations are done in staging area so that performance of source
system in not degraded. Staging area gives an opportunity as to validate
extracted data before it moves into the DW.

Three Data Extraction methods:


1. Full Extraction
2. Partial Extraction- without update notification.
3. Partial Extraction- with update notification

Step 2) Transformation
Data extracted from source server is raw and not usable in its original
form and needs to be cleansed, mapped and transformed.

Fig 22: Data Integration Issues

69
Data Mining and Business Step 3) Loading
Intelligence
Large volume of data needs to be loaded in a relatively short period and
needs to be optimized for performance.
In case of load failure, recover mechanisms should be configured to restart
from the point of failure without data integrity loss.

Types of Loading:
❖ Initial Load — populating all the Data Warehouse tables
❖ Incremental Load — applying ongoing changes as when needed
periodically.
❖ Full Refresh —erasing the contents of one or more tables and
reloading with fresh data.

ETL Tools
Prominent data warehousing tools available in the market are:

1. MarkLogic:
https://www.marklogic.com/product/getting-started/

2. Oracle:
https://www.oracle.com/index.html
3. Amazon RedShift:
https://aws.amazon.com/redshift/?nc2=h_m1

Best practices ETL process


❖ Never try to cleanse all the data
❖ Never cleanse anything
❖ Determine the cost of cleansing the data
❖ To speed up query processing, have auxiliary views and indexes
Difference between ETL and ELT

ETL (Extract, Transform, and Load)


Extract, Transform and Load is the technique of extracting the record from
sources to a staging area, then transforming or reformatting with business
manipulation performed on it in order to fit the operational needs or data
analysis, and later loading into the goal or destination databases or data
warehouse.

70
Bi Using Data Warehousing

Fig 23: ETL


Strengths
❖ Development Time
❖ Targeted data
❖ Tools Availability

Weaknesses
❖ Flexibility
❖ Hardware
❖ Learning Curve
ELT (Extract, Load and Transform)
ELT stands for Extract, Load and Transform is the various sights while
looking at data migration or movement. ELT involves the extraction of
aggregate information from the source system and loading to the target
method instead of transformation between the extraction and loading
phase. Once the data is copied or loaded into the target method, then
change takes place.

Fig 24: ELT


Strengths
❖ Project Management
❖ Flexible & Future Proof

71
Data Mining and Business ❖ Risk minimization
Intelligence
❖ Utilize Existing Hardware
❖ Utilize Existing Skill sets
Weaknesses
❖ Against the Norm
❖ Tools Availability
Difference between ETL vs. ELT
Basics ETL ELT
Process Data is transferred to the Data remains in the DB
ETL server and moved except for cross Database
back to DB. High network loads (e.g. source to object).
bandwidth required.
Transformation Transformations are Transformations are
performed in ETL Server. performed (in the source or)
in the target.
Code Usage Typically used for Typically used for
❖ Source to target ❖ High amounts of data
transfer
❖ Compute-intensive
Transformations
❖ Small amount of
data
Time- It needs highs Low maintenance as data is
Maintenance maintenance as you need always available.
to select data to load and
transform.
Calculations Overwrites existing Easily add the calculated
column or Need to column to the existing table.
append the dataset and
push to the target
platform.
Analysis

4.4 DATA WAREHOUSE DESIGN


A data warehouse is a single data repository where a record from multiple
data sources is integrated for online business analytical processing
(OLAP). This implies a data warehouse needs to meet the requirements
72
from all the business stages within the entire organization and is hugely Bi Using Data Warehousing
complex, lengthy, and hence error-prone process. Furthermore, data
warehouse and OLAP systems are dynamic, and the design process is
continuous.
Data warehouse design takes a method different from view materialization
in the industries and has two approaches
1. "top-down" approach
2. "bottom-up" approach

Top-down Design Approach


In the "Top-Down" design approach, a data warehouse is described as a
subject-oriented, time-variant, non-volatile and integrated data repository
for the entire enterprise data from different sources are validated,
reformatted and saved in a normalized (up to 3NF) database as the data
warehouse. Main advantage of this method is it supports a single
integrated data source.

Advantages of top-down design


❖ Data Marts are loaded from the data warehouses.
❖ Developing new data mart from the data warehouse is very easy.

Disadvantages of top-down design


❖ This technique is inflexible to changing departmental needs.
❖ The cost of implementing the project is high.

Fig 25: Top Down Design Approach

73
Data Mining and Business Bottom-Up Design Approach
Intelligence
In "Bottom-Up" approach, a DW is described as "a copy of transaction
data specifical architecture for query and analysis," term the star schema.
In this approach, a data mart is created first to necessary reporting and
analytical capabilities for particular business processes. Data marts include
the lowest grain data and, aggregated data, if needed.
Main advantage of "bottom-up" design approach is, it has quick ROI, and
takes less time and effort than developing an enterprise-wide data
warehouse. In addition to it the risk of failure is even less. This method is
inherently incremental. This method allows the project team to learn and
grow.

Fig 26: Bottom Up Design Approach


Advantages of bottom-up design
❖ Documents can be generated quickly.
❖ DW can be extended to accommodate new business units.
❖ Developing new data marts and then integrating with other data
marts.

Disadvantages of bottom-up design


The locations of the data warehouse and the data marts are reversed in the
bottom-up approach design.

74
Differentiate between Top-Down Design Approach and Bottom-Up Bi Using Data Warehousing
Design Approach

Top-Down Design Approach Bottom-Up Design Approach

Breaks the vast problem into Solves the essential low-level problem
smaller sub problems. and integrates them into a higher one.

Inherently architected- not a Inherently incremental; can schedule


union of several data marts. essential data marts first.

Single, central storage of Departmental information stored.


information about the content.

Centralized rules and control. Departmental rules and control.

It includes redundant Redundancy can be removed.


information.

It may see quick results if Less risk of failure, favorable return on


implemented with repetitions. investment, and proof of techniques.



75
5
DATA MART
Unit Structure
5.1 Data mart
5.2 OLAP
5.3 Dimensional Modeling
5.4 Operations on Data Cube
5.5 Schema
5.6 References
5.7 MOOCs
5.8 Video Lectures
5.9 Quiz

5.1 DATA MART


Is focused on a single functional area of an organization and contains a
subset of data stored in a Data Warehouse. A Data Mart is a condensed
version of Data Warehouse and is designed for use by a specific
department, unit or set of users in an organization. Data marts are small in
size and are more flexible compared to a Datawarehouse.
Types of Data Mart
There are three main types of data mart:
1. Dependent
2. Independent
3. Hybrid

Dependent Data Mart


It allows sourcing organization's data from a single Data Warehouse.
Dependent Data Mart can be built in two different ways.
❖ A user can access both the data mart and data warehouse, depending
on need, or where access is limited only to the data mart.
❖ Produces data junkyard (all data begins with a common source, and are
scrapped & mostly junked).

76
Data mart

Fig 27: Dependent Data Mart

Independent Data Mart


It is created without the use of central Data warehouse and is an ideal
option for smaller groups within an organization.It has neither a
relationship with the enterprise data warehouse nor with any other data
mart.

Fig 28: Independent Data Mart

77
Data Mining and Business
Intelligence
Hybrid Data Mart:
It combines input from sources apart from Data warehouse and is helpful
in integration. Hybrid Data mart also supports large storage structures, and
it is best suited for flexible for smaller data-centric applications.

Fig 29: Hybrid Data

Steps in Implementing a Datamart

Fig 30: Steps in implementing a Data mart


Designing
Designing is the first phase of Data Mart implementation. It includes the
following tasks:
❖ Gathering the business & technical requirements and Identifying data
sources.
❖ Selecting the appropriate subset of data.
❖ Designing the logical and physical structure of the data mart.
Data could be partitioned based on following criteria:
❖ Date
❖ Business or Functional Unit
❖ Geography
❖ Any of the above combination

78
Constructing Data mart
In this second phase of implementation it involves in creating the physical
database and the logical structures. Involves the following tasks:
● Implementing the physical database designed in the earlier phase.
Database schema objects like table, indexes, views, etc. are to be created.

Populating:
In the third phase, data in populated in the data mart involving the
following tasks:
❖ Data Mapping
❖ Extraction of source data
❖ Cleaning and transformation operations
❖ Loading data into the data mart
❖ Creating and storing metadata

Accessing
Accessing is a fourth step which involves putting the data to use and
submit queries to the database & display the results of the queries
The accessing step needs to perform the following tasks:
❖ Translates database structures and objects names into business terms
❖ Set up and maintain database structures.
❖ Set up API and interfaces if required

Managing
Is the last step of Data Mart Implementation process and covers
management tasks like:
❖ User access management.
❖ System optimizations and fine-tuning
❖ Adding and managing fresh data into the data mart.
❖ Planning recovery scenarios and ensure system availability in the case
of system fails.

Advantages and Disadvantages of a Data Mart


Advantages
● Data is valuable to a specific group of people in an organization.
● Cost-effective
79
Data Mining and Business ● Easy to use and can accelerate business processes.
Intelligence
● Consumes less implementation time as compared to Data Warehouse
systems.
● Contains historical data enabling the analyst to determine data trends.

Disadvantages
● Maintenance problem.
● Data analysis is limited.

5.2 OLAP
Online Analytical Processing provide analysis of data for business
decisions and allow users to analyze database information from multiple
database systems at one time.
The primary objective is data processing and not data analysis
Example of OLAP
Any Data warehouse system is an OLAP system.
Uses of OLAP:
❖ A company might compare their mobile phone sales in September with
sales in October, then compare those results with another location
which may be stored in a separate database.
❖ Amazon analyzes purchases by its customers to come up with a
personalized homepage with products which likely interest to their
customer.

Benefits of using OLAP services


❖ OLAP creates a single platform for all type of business analytical
needs which includes planning, budgeting, forecasting, and analysis.
❖ The main benefit of OLAP is the consistency of information and
calculations.
❖ Easily apply security restrictions on users and objects to comply with
regulations and protect sensitive data.

Drawbacks of OLAP service


❖ Implementation and maintenance are dependent on IT professional
because the traditional OLAP tools require a complicated modeling
procedure.
❖ OLAP tools need cooperation between people of various departments
to be effective which might always be not possible.

80
OLTP Data mart
Online transaction processing supports transaction-oriented applications in
a 3-tier architecture administering day to day transaction of an
organization.

Example of OLTP system


An example of OLTP system is ATM center. Assume that a couple has a
joint account with a bank. One day both simultaneously reach different
ATM centers at precisely the same time and want to withdraw total
amount present in their bank account.
However, the person that completes authentication process first will be
able to get money. In this case, OLTP system makes sure that withdrawn
amount will be never more than the amount present in the bank. The key
to note here is that OLTP systems are optimized for transactional
superiority instead data analysis.
Other examples of OLTP applications are:
❖ Online banking
❖ Online airline ticket booking
❖ Sending a text message
❖ Order entry
❖ Add a book to shopping cart

Benefits of OLTP method


❖ It administers daily transactions of an organization.
❖ OLTP widens the customer base of an organization by simplifying
individual processes.

Drawbacks of OLTP method


❖ If OLTP system faces hardware failures, then online transactions get
severely affected.
❖ OLTP systems allow multiple users to access and change the same
data at the same time which many times created unprecedented
situation.

81
Data Mining and Business OLTP Vs OLAP
Intelligence

Fig 31: OLTP VS OLAP

5.3 DIMENSIONAL MODELING


Dimensional modeling represents data with a cube operation, making
more suitable logical data representation with OLAP data management.
The transaction record is divided into either "facts," which are frequently
numerical transaction data, or "dimensions," which are the reference
information that gives context to the facts.
Fact: Is a collection of associated data items, consisting of measures and
context data representing business items or business transactions.

82

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy