Ch2 Literature
Ch2 Literature
LITERATURE REVIEW
2.1 Introduction
In this chapter we review the literature in data warehousing and its application in higher
start with an overview of DW. Thereafter, we briefly compare DW and data mart (DM). This is
followed by a review to the available architectures of DWs. Then, we review the most famous
DW in HE.
a review to literature showed that there are two major definitions for DW; those are [1] and
nonvolatile, and time variant collection of data in support of managerial decision making
process. [3] discussed elaborately the four parts that constitutes Inmon's definition. First, a DW is
subject oriented means that it is organized as major subjects of an enterprise (like: students,
programs, and courses) instead of major areas of application. Which means a DW will store
decisional data instead of application-oriented data. Further, a DW is integrated means that data
is consolidated as the source data is normally inconsistent, opposite to the fact that data in DW is
for some time period and not valid elsewhere. Also, DW is non-volatile, data will never be
updated in a real time fashion, but rather will be refreshed from source systems regularly. On the
system (DBMS) are built for OLTP support, and are thus regarded unsuitable for data
warehousing. OLTP systems have been invented for maximizing transaction processing
capability, whereas OLAP systems have been designed to support managerial query processing.
Further, OLTP systems contain fresh data, whereas OLAP systems contain historical data.
Additionally, OLTP systems provide transactional data for big number of operational users,
whereas OLAP systems provide decisional data for relatively small number of managerial users.
Maybe the most interesting difference is the fact that OLTP systems support daily decisions,
such, corporations tend to build data mart (DM), which is a departmental unit meant for
supporting a department of the corporation. Autonomous DMs will most probably be integrated
in a future to form an enterprise-wide DW. Next subheading discusses main differences between
DW and DM.
A DW holds information related to many subject areas, often, the entire enterprise, whereas a
DM holds data related to one subject area, department, a business unit or work group. A DM
provides data for small group of users. Also, building DM is simpler than that of an enterprise-
wide DW. However, designers of autonomous DMs must consider the fact that these DMs will
be mixed in a future process to constitute a DW that serves all units of an organization. As such,
technical architecture of autonomous DMs must be standardized during the design process.
It is a must to review the traditional architecture of DW in order to understand it is working
mechanism.
of the workflow associated. Figure 2.1 depicts main components of a DW and their relations.
Data Marts
End
Students Human Access
Resources Tools
ETL
Operational
sources
Data warehouse
Operational sources contain Source data for populating DWs. DW holds information
concerned with many organizational units, whereas DM presents information to one unit or
harmonize data before moving it to a DW [4]. Data will first be extracted from operational
sources [5]. Many methods will be operated to alter source data and make it conforms to a DW
format in the transformation step. Thereafter, data will be loaded to a DW storage for access by
the user.
The main purpose of DWs is enabling managers to perform analysis using business
intelligence (BI) tools. tools can be as simple as queries , or as complex as data mining
applications [2].
Data
Operational Data Staging Access tools
Source Presentation
Systems Area Area
Operational source systems are queried in a narrowed fashion. The only difference
between the traditional architecture and the Kimball’s one is that the last contains a data staging
area. According to [6], data staging area is a storage space and a group of processes called
(ETL).
As per Kimball’s infrastructure, Data presentation area is considered the DW. It can be
[3] provided another view for DW architecture. They mix ETL process with data
presentation area in one block, which means that user is able to query results of ETL process
This architecture is generic enough to be applied in combination with any DW design and
development method.
2.5 Data warehouse design and development methods
A Data Warehouse (DW) is a group of databases integrated to support strategic decision making
capabilities at organizations. DWs integrate data from heterogeneous sources and present it in
multidimensional view that facilitates reporting and querying. The design and development of
DWs is a cumbersome task that may takes many years to complete. This, in turns, raised many
challenges that need to be resolved in order to avoid DW project failure. Some of those
difficulties are; requirement engineering for DWs, and design and development methods for
DWs. As a consequence, various researchers in the literature have presented different approaches
DWs play a vital role in the success of today’s strategic planning in all kinds of businesses. As
for other information system projects, DW respects a design method that consists of many steps
and phases. There is no consensus in the literature about what an ideal DW design method
should include. However, all of the available methods contain two main stages; namely, business
requirement’s definition and logical design. Our focus in this section is on the logical design
methods available in the literature. We summarize some of the available DW logical design
methods. We also include a description for the activities involved in each one of them. Further,
.we shed the light on some potential problems that these methods incorporate
An extensive research has been found in literature that is related to design and development of
DWs in various arenas. However, only few studies have been devoted to design and development
of DWs in HE. On the downside, these researches did not apply a systematic approach for the
design of DWs in HE. In this subheading, we review some of the basic methods that are
Applying this method means that an autonomous model will be designed for each DM
separately. Thereafter, local models will be integrated to construct a global DW model for the all
The method can be seen as like divided to two circles. Those are, DW design circle, which
constitutes first three phases; planning, business requirements definition and dimensional
modeling. The last three phases of the figure above are development related phases. In planning ,
DW designer is responsible for planning ahead for the design and development processes to be
included in the data warehousing lifecycle. Also, in business requirements definition, interviews
will be conducted with decision makers. Dimensional modelling is the process of mapping
requirements to a relational logical design graph called a dimensional model [6]. In dimensional
We here review some of the basic terms of dimensional modeling. a fact table is the main
table in a dimensional model ,where numerical measurements of performance for a business are
captured. Further, Facts in a fact table are business measurements. Also, a grain of a fact table is
level of information available in a fact table. Furthermore, dimension tables describes data held
by fact tables. A star schema constitutes a fact table in its center and various dimension tables
[6] proposed a dimensional modeling nine-step method. First, selecting a business process to
be modeled. A fact table represents a business process. Second, declaring a grain of a business
process by merely specifying the contents of individual row of a fact table. Third, choosing
dimensions to apply in every row of a fact table. Fourth, identifying numeric facts for populating
every row of a fact table. Recalling the business development method of figure 2.4, physical
design means defining fact and dimension tables in DW database. In data staging, ETL tools will
be utilized to extract, transform and load data to tables of DW in the database. Deployment,
where data of the DW will be made available for benefit of end user.[6]
proposed a top-down design approach, where a global model will be designed for an ]9[
enterprise-wide DW, thereafter , a DM for every business process will be derived. In this
method, the DW dimensional model will be revisited for refinement each time a new
departmental database is added which is a time consuming task. Because DW design is driven by
data, this method is data-driven. The method starts by gathering data, integrating and testing and
the last thing is to define the business requirements. Inmon's DW approach is top-down. Each
.DM is a representation for a business function for individual department in the company
Bonifati, et al. [10] Proposed a DW logical design method that is hybrid of the two, bottom-
up and top-down, methods. Their method consists of three main steps; those are top-down
analysis, bottom-up analysis and integration. In top-down analysis, user requirements will be
gathered by conducting interviews. The outcome of this is an ideal star schema.in the bottom-up
analysis, the conceptual model of the operational database will be examined. The outcome of this
is a candidate star schema for the DW. The integration step matches each ideal star schema with
all candidate star schemas and ranks them using of a set of metrics. Afterwards, designer will
select the candidate that best match the ideal schema. Figure 2 depicts the schematic diagram of
.this method
Candidate
Step 1: Star
Top-down Ideal Star Schema#1
Schema
Candidate
Star Step 2:
Schema#2 Bottom-Up
St grat
In
User
ep i o
te
requirements:
3: n
.
Interviews .
.
Mazón, et al. [11] Proposed a hybrid method that is semi-automated. In this approach, user
requirements leads to the construction of an initial conceptual schema. Later on, Query-View-
Transformation (QVT) relations will be used to test the correctness of those schemas. The first
process in their approach is the requirement analysis were they suggested that end user tells the
requirement with highly abstract statements which are reflected from the business goals of the
organization. Afterwards, those requirements will be modeled using a framework they proposed.
Then, modeled requirements will lead to the construction of multidimensional schemas using
UML notations proposed by authors as an extension to the conventional UML. As a final step,
they check the correctness of the resulting schemas using QVT relations that they have
presented. The goal of this is to check the correctness of the schemas by comparing them with
.those of the source data and find out whether they can potentially satisfy the requirements
Song, et al. [12] Proposed an automatic method that is capable of deriving logical DW schemas
from the entity-relationship models of the transactional sources. They have introduced
connection topology value (CTV) to automate the identification of the facts and fact tables from
This method works as the following: they investigate the relationships among tables of the
source schema. Wherever they find a many-to-one relationships, they consider it as a potential
schema for facts and dimensions which are normally related by such relationships. Attributes of
the many side are potential candidates as facts. Also, Attributes in the one-side are potential
.candidates as dimensions
In our research, and since no single method of the literature is specific for DW design in HE, we
propose a hybrid DW design method that encapsulates benefits from various existing methods.
Next section elaborates some research that have applied these methods in HE arena.
2.6 Applying data warehouses at higher education
successful Organizations are able to utilize information effectively [13]. In addition, huge
amount of data aggregated exponentially forces HE to apply decision support systems like DWs
It was until 1990’s that HE recognized The effectiveness of applying DWs [15]. Arizona State
University (ASU) was the first HE organization to apply DW. subsequently, various HE
Various authors like [15, 17-24] have applied different methods for designing DWs at their HE
institutions. in University of Georgetown, [23] researched issues related to model student’s data .
Various Multidimensional models, like the one for registrar’s DM, was provided. also, [15]
addressed matters associated with designing and developing a DW for the information system of
Croatia’s HE. They have designed two schemas related to exams matters. Lin [25] Adopted
Florida (UF) in order to help UF to make better strategic decisions. [24] applied a top-down
approach method for design. [17-22, 26] applied a dimensional modeling approach, where they
first modeled DMs for autonomous subject areas considering that they will be integrated in a
DW design for HE have received a small attention in literature, it was clear that there is a
significant gap in literature formed by an absence of a sequence of methods which describes the
application of a design method in details for HE. In order to fill this gap in literature, this thesis
In This chapter, we have reviewed literature in suitable aspects of DWs. Aspects like, DW
architecture, DW design and the application of DWs in HE have been reviewed elaborately. The
aim of reviewing in this manner was to prove that HE’s Information Systems (IS) cannot survive
[1] W. H. Inmon, Building the data warehouse, 4 ed. New York: John Wiley, 2005.
[2] R. Kimball and M. Ross, The Data Warehouse Toolkit: The Definitive Guide to
Dimensional Modeling. New York: John Wiley, 2013.
[3] T. Connolly and C. Begg, Database systems: A Practical Approach to Design,
Implementation, and Management. England: Pearson Education, 2015.
[4] A. Simitsis, "Mapping conceptual to logical models for ETL processes," presented at the
Proceedings of the 8th ACM international workshop on Data warehousing and OLAP,
Bremen, Germany, 2005.
[5] R. Kimball and J. Caserta, The Data Warehouse ETL Toolkit: Practical Techniques for
Extracting, Cleaning, Conforming, and Delivering Data. New York: John Wiley, 2004.
[6] R. Kimball and M. Ross, The Data Warehouse Toolkit: The Complete Guide to
Dimensional Modeling. New York: John Wiley, 2002.
[7] S. Chaudhuri and U. Dayal, "An overview of data warehousing and OLAP technology,"
SIGMOD Rec., vol. 26, pp. 65-74, 1997.
[8] S. Rizzi, A. Abell, J. Lechtenbörger, and J. Trujillo, "Research in data warehouse
modeling and design: dead or alive?," presented at the Proceedings of the 9th ACM
international workshop on Data warehousing and OLAP, Arlington, Virginia, USA,
2006.
[9] W. H. Inmon, Building the Data Warehouse. New York: John Wiley, 2002.
[10] A. Bonifati, P. d. Milano, F. Cattaneo, Cefriel, and S. Ceri, "Designing data marts for
data warehouses," ACM Trans. Softw. Eng. Methodol., vol. 10, pp. 452-483, 2001.
[11] J.-N. Mazón, J. Trujillo, and J. Lechtenbörger, "Reconciling requirement-driven data
warehouses with data sources via multidimensional normal forms," Data & Knowledge
Engineering, vol. 63, pp. 725-751, 12// 2007.
[12] I. Y. Song, R. Khare, and B. Dai, "SAMSTAR: a semi-automated lexical method for
generating star schemas from an entity-relationship diagram," in Proceedings of the ACM
tenth international workshop on Data warehousing and OLAP, 2007, pp. 9-16.
[13] S. Grotevant and D. Foth, "The power of multidimensional analysis (OLAP) in higher
education enterprise reporting strategies," presented at the College and University
Machine Records Conference, 1999.
[14] E. Şuşnea, "Improving Decision Making Process in Universities: A Conceptual Model of
Intelligent Decision Support System," Procedia - Social and Behavioral Sciences, vol.
76, pp. 795-800, 2013.
[15] M. Baranovic, M. Madunic, and I. Mekterovic, "Data Warehouse as a Part of the Higher
Education Information System in Croatia," presented at the Proceedings of the 25th
International Conference on Information Technology Interfaces(ITI), Cavtat, Croatia.,
2003.
[16] J. D. Porter and J. J. Rome, "Lessons from a Successful Data Warehouse
Implementation," CAUSE/EFFECT, vol. 18, pp. 43-50, 1995.
[17] T. T. Wai and A. Sint Sint, "Metadata Based Student Data Extraction from Universities
Data Warehouse," presented at the International Conference on Signal Processing
Systems, Singapore, 2009.
[18] K. Rabuzin, "The use of a data warehouse for analyzing exams at the university,"
presented at the 3rd International Conference on Data Mining and Intelligent Information
Technology Applications (ICMiA), Macao,China, 2011.
[19] P. Tanuska, O. Vlkovic, A. Vorstermans, and W. Verschelde, "The proposal of ontology
as a part of University data warehouse," presented at the 2nd International Conference on
Education Technology and Computer (ICETC) Shanghai, China, 2010.
[20] D.-P. Zhang, "A Data Warehouse Based on University Human Resource Management of
Performance Evaluation," presented at the International Forum on Information
Technology and Applications (IFITA), Chengdu, China, 2009.
[21] J.-F. Desnos, "A National Data Warehouse Project for French Universities," presented at
the 7th International Conference of European University Information Systems on The
Changing Universities - The Role of Technology, Berlin,Germany, 2001.
[22] A. Flory, P. Soupirot, and A. Tchounikine, "A Design and implementation of a data
warehouse for research administration universities," presented at the Proceedings of the
The 7th International Conference of European University Information Systems on The
Changing Universities - The Role of Technology, Berlin, Germany, 2001.
[23] R. G. Allan and D. R. May, "Data Models for a Registrar's Data Mart," presented at the
Higher Education Administrative Technology conference (CUMREC), Arlington,
Virginia,USA, 2000.
[24] J. W. Graham, "Constructing a student data warehouse," presented at the Proceedings of
the 30th annual ACM SIGUCCS conference on User services, Providence, Rhode Island,
USA, 2002.
[25] M. C. Lin, "University Data Warehouse Design Issues: A Case Study," presented at the
Proceedings of the 2001 American Society for Engineering Education Annual
Conference & Exposition Albuquerque, New Mexico,USA, 2001.
[26] I. O. Awoyelu, Scalable Distributed Data Warehouse System: Higher Educational
Institutions Data Warehouse System: LAP Lambert Academic Publishing, 2011.