0% found this document useful (0 votes)
41 views15 pages

Ch2 Literature

This chapter reviews the literature on data warehousing and its application in higher education. It begins with an overview of data warehousing, comparing data warehouses and data marts. It then reviews common data warehouse architectures and the most famous design and development processes. Finally, it discusses applications of data warehousing in higher education.

Uploaded by

sami hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views15 pages

Ch2 Literature

This chapter reviews the literature on data warehousing and its application in higher education. It begins with an overview of data warehousing, comparing data warehouses and data marts. It then reviews common data warehouse architectures and the most famous design and development processes. Finally, it discusses applications of data warehousing in higher education.

Uploaded by

sami hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

CHAPTER 2

LITERATURE REVIEW
2.1 Introduction

In this chapter we review the literature in data warehousing and its application in higher

education (HE). Example implementation for HE are reviewed. It is organized as follows; we

start with an overview of DW. Thereafter, we briefly compare DW and data mart (DM). This is

followed by a review to the available architectures of DWs. Then, we review the most famous

design and development processes of DWs. Finally, we conclude by a review to applications of

DW in HE.

2.2 Data Warehouse Overview

a review to literature showed that there are two major definitions for DW; those are [1] and

[2]. On one hand, In Inmon's [1] perspectives, a DW is a subject oriented, integrated,

nonvolatile, and time variant collection of data in support of managerial decision making

process. [3] discussed elaborately the four parts that constitutes Inmon's definition. First, a DW is

subject oriented means that it is organized as major subjects of an enterprise (like: students,

programs, and courses) instead of major areas of application. Which means a DW will store

decisional data instead of application-oriented data. Further, a DW is integrated means that data

is consolidated as the source data is normally inconsistent, opposite to the fact that data in DW is

consistent and provide a unified to users. Furthermore, DW is time-variant as data in DW is valid

for some time period and not valid elsewhere. Also, DW is non-volatile, data will never be

updated in a real time fashion, but rather will be refreshed from source systems regularly. On the

other hand, [2] defined a DW as a query-able data source in the enterprise.


According to [3], DWs support on-line analytical processing (OLAP) while operational

databases support online transactional processing (OLTP). Traditional database management

system (DBMS) are built for OLTP support, and are thus regarded unsuitable for data

warehousing. OLTP systems have been invented for maximizing transaction processing

capability, whereas OLAP systems have been designed to support managerial query processing.

Further, OLTP systems contain fresh data, whereas OLAP systems contain historical data.

Additionally, OLTP systems provide transactional data for big number of operational users,

whereas OLAP systems provide decisional data for relatively small number of managerial users.

Maybe the most interesting difference is the fact that OLTP systems support daily decisions,

whereas OLAP systems support strategic decision making.

Building an enterprise-wide DW is a cumbersome task that takes years to be completed. As

such, corporations tend to build data mart (DM), which is a departmental unit meant for

supporting a department of the corporation. Autonomous DMs will most probably be integrated

in a future to form an enterprise-wide DW. Next subheading discusses main differences between

DW and DM.

2.3 Data Mart

A DW holds information related to many subject areas, often, the entire enterprise, whereas a

DM holds data related to one subject area, department, a business unit or work group. A DM

provides data for small group of users. Also, building DM is simpler than that of an enterprise-

wide DW. However, designers of autonomous DMs must consider the fact that these DMs will

be mixed in a future process to constitute a DW that serves all units of an organization. As such,

technical architecture of autonomous DMs must be standardized during the design process.
It is a must to review the traditional architecture of DW in order to understand it is working
mechanism.

2.4 Data Warehouse Architecture

An architecture means basic components constituting a DW and their relationships in terms

of the workflow associated. Figure 2.1 depicts main components of a DW and their relations.

Data Marts

End
Students Human Access
Resources Tools

ETL

Operational
sources
Data warehouse

Figure 2.1: Traditional DW Architecture

Operational sources contain Source data for populating DWs. DW holds information

concerned with many organizational units, whereas DM presents information to one unit or

department at a time [1] .

many processes referred to as extract-transformation-load (ETL) will be applied to

harmonize data before moving it to a DW [4]. Data will first be extracted from operational

sources [5]. Many methods will be operated to alter source data and make it conforms to a DW
format in the transformation step. Thereafter, data will be loaded to a DW storage for access by

the user.

The main purpose of DWs is enabling managers to perform analysis using business

intelligence (BI) tools. tools can be as simple as queries , or as complex as data mining

applications [2].

Data
Operational Data Staging Access tools
Source Presentation
Systems Area Area

Services: Data Mart #1 Query Tools


Extract Clean, combine DIMENSIONAL
dimensions Load ,single business
user queries are process Access Analytic
not allowed here
Applications
Data Storages
Modeling
including DW Bus: applications
Extract Flat files and Conformed facts and dimensions Forecasting
relational tables
And
Data mining
Processing like
Sorting
Extract
Load
Data Mart #2 ... Access

Figure 2.2: Kimball’s data warehouse architecture (Adopted from [2])

Operational source systems are queried in a narrowed fashion. The only difference

between the traditional architecture and the Kimball’s one is that the last contains a data staging

area. According to [6], data staging area is a storage space and a group of processes called

(ETL).
As per Kimball’s infrastructure, Data presentation area is considered the DW. It can be

decomposed into multiple DMs.

[3] provided another view for DW architecture. They mix ETL process with data

presentation area in one block, which means that user is able to query results of ETL process

before data preprocessing.

[7] provided a DW architecture that is shown in figure 2.3.

Figure 2.3: Data Warehousing Architecture (Adopted from [7])

This architecture is generic enough to be applied in combination with any DW design and

development method.
2.5 Data warehouse design and development methods

A Data Warehouse (DW) is a group of databases integrated to support strategic decision making

capabilities at organizations. DWs integrate data from heterogeneous sources and present it in

multidimensional view that facilitates reporting and querying. The design and development of

DWs is a cumbersome task that may takes many years to complete. This, in turns, raised many

challenges that need to be resolved in order to avoid DW project failure. Some of those

difficulties are; requirement engineering for DWs, and design and development methods for

DWs. As a consequence, various researchers in the literature have presented different approaches

.for DW design [8]

DWs play a vital role in the success of today’s strategic planning in all kinds of businesses. As

for other information system projects, DW respects a design method that consists of many steps

and phases. There is no consensus in the literature about what an ideal DW design method

should include. However, all of the available methods contain two main stages; namely, business

requirement’s definition and logical design. Our focus in this section is on the logical design

methods available in the literature. We summarize some of the available DW logical design

methods. We also include a description for the activities involved in each one of them. Further,

.we shed the light on some potential problems that these methods incorporate

An extensive research has been found in literature that is related to design and development of

DWs in various arenas. However, only few studies have been devoted to design and development

of DWs in HE. On the downside, these researches did not apply a systematic approach for the

design of DWs in HE. In this subheading, we review some of the basic methods that are

commonly used for DW design and development in all domains of research.


Two major DW design and development approaches can be tracked in literature. [2]

proposed a bottom-up method referred to as business dimensional lifecycle methodology as

depicted in figure 2.4.

Planning and Dimensional Physical Deployment


growth modeling design

Business Data staging


requirements design and
definition development

Figure 2.4: business dimensional lifecycle methodology

Applying this method means that an autonomous model will be designed for each DM

separately. Thereafter, local models will be integrated to construct a global DW model for the all

business processes of an enterprise.

The method can be seen as like divided to two circles. Those are, DW design circle, which

constitutes first three phases; planning, business requirements definition and dimensional

modeling. The last three phases of the figure above are development related phases. In planning ,

DW designer is responsible for planning ahead for the design and development processes to be

included in the data warehousing lifecycle. Also, in business requirements definition, interviews

will be conducted with decision makers. Dimensional modelling is the process of mapping

requirements to a relational logical design graph called a dimensional model [6]. In dimensional

models, data is visualized as schemas, either star or snowflake schemas.

We here review some of the basic terms of dimensional modeling. a fact table is the main

table in a dimensional model ,where numerical measurements of performance for a business are

captured. Further, Facts in a fact table are business measurements. Also, a grain of a fact table is
level of information available in a fact table. Furthermore, dimension tables describes data held

by fact tables. A star schema constitutes a fact table in its center and various dimension tables

surrounding it. [6]

[6] proposed a dimensional modeling nine-step method. First, selecting a business process to

be modeled. A fact table represents a business process. Second, declaring a grain of a business

process by merely specifying the contents of individual row of a fact table. Third, choosing

dimensions to apply in every row of a fact table. Fourth, identifying numeric facts for populating

every row of a fact table. Recalling the business development method of figure 2.4, physical

design means defining fact and dimension tables in DW database. In data staging, ETL tools will

be utilized to extract, transform and load data to tables of DW in the database. Deployment,

where data of the DW will be made available for benefit of end user.[6]

proposed a top-down design approach, where a global model will be designed for an ]9[

enterprise-wide DW, thereafter , a DM for every business process will be derived. In this

method, the DW dimensional model will be revisited for refinement each time a new

departmental database is added which is a time consuming task. Because DW design is driven by

data, this method is data-driven. The method starts by gathering data, integrating and testing and

the last thing is to define the business requirements. Inmon's DW approach is top-down. Each

.DM is a representation for a business function for individual department in the company

Bonifati, et al. [10] Proposed a DW logical design method that is hybrid of the two, bottom-

up and top-down, methods. Their method consists of three main steps; those are top-down

analysis, bottom-up analysis and integration. In top-down analysis, user requirements will be

gathered by conducting interviews. The outcome of this is an ideal star schema.in the bottom-up
analysis, the conceptual model of the operational database will be examined. The outcome of this

is a candidate star schema for the DW. The integration step matches each ideal star schema with

all candidate star schemas and ranks them using of a set of metrics. Afterwards, designer will

select the candidate that best match the ideal schema. Figure 2 depicts the schematic diagram of

.this method

Candidate
Step 1: Star
Top-down Ideal Star Schema#1
Schema
Candidate
Star Step 2:
Schema#2 Bottom-Up
St grat
In

User
ep i o
te

requirements:
3: n

.
Interviews .
.

Matches each Candidate


ideal star schema Star
from with all Schema#n Examine
candidate star Database
schemas and ranks Conceptual
them in terms of a Model
Candidate Star
set of metrics
Schema that best fits
the Ideal Star
Schema

.Figure 2.5: [10] Hybrid Data Warehouse Design Method

Mazón, et al. [11] Proposed a hybrid method that is semi-automated. In this approach, user

requirements leads to the construction of an initial conceptual schema. Later on, Query-View-

Transformation (QVT) relations will be used to test the correctness of those schemas. The first

process in their approach is the requirement analysis were they suggested that end user tells the
requirement with highly abstract statements which are reflected from the business goals of the

organization. Afterwards, those requirements will be modeled using a framework they proposed.

Then, modeled requirements will lead to the construction of multidimensional schemas using

UML notations proposed by authors as an extension to the conventional UML. As a final step,

they check the correctness of the resulting schemas using QVT relations that they have

presented. The goal of this is to check the correctness of the schemas by comparing them with

.those of the source data and find out whether they can potentially satisfy the requirements

Song, et al. [12] Proposed an automatic method that is capable of deriving logical DW schemas

from the entity-relationship models of the transactional sources. They have introduced

connection topology value (CTV) to automate the identification of the facts and fact tables from

.the ER schemas of the sources

This method works as the following: they investigate the relationships among tables of the

source schema. Wherever they find a many-to-one relationships, they consider it as a potential

schema for facts and dimensions which are normally related by such relationships. Attributes of

the many side are potential candidates as facts. Also, Attributes in the one-side are potential

.candidates as dimensions

In our research, and since no single method of the literature is specific for DW design in HE, we

propose a hybrid DW design method that encapsulates benefits from various existing methods.

Next section elaborates some research that have applied these methods in HE arena.
2.6 Applying data warehouses at higher education

successful Organizations are able to utilize information effectively [13]. In addition, huge

amount of data aggregated exponentially forces HE to apply decision support systems like DWs

for facilitating their strategic decisions [14].

It was until 1990’s that HE recognized The effectiveness of applying DWs [15]. Arizona State

University (ASU) was the first HE organization to apply DW. subsequently, various HE

organizations have followed [16].

Various authors like [15, 17-24] have applied different methods for designing DWs at their HE

institutions. in University of Georgetown, [23] researched issues related to model student’s data .

Various Multidimensional models, like the one for registrar’s DM, was provided. also, [15]

addressed matters associated with designing and developing a DW for the information system of

Croatia’s HE. They have designed two schemas related to exams matters. Lin [25] Adopted

dimensional modeling as a logical design method in order to design a DW for University of

Florida (UF) in order to help UF to make better strategic decisions. [24] applied a top-down

approach method for design. [17-22, 26] applied a dimensional modeling approach, where they

first modeled DMs for autonomous subject areas considering that they will be integrated in a

future development to constitute an enterprise-wide DW.

DW design for HE have received a small attention in literature, it was clear that there is a

significant gap in literature formed by an absence of a sequence of methods which describes the

application of a design method in details for HE. In order to fill this gap in literature, this thesis

aims to provide a systematic approach for designing a DW in HE.


2.7 Conclusion

In This chapter, we have reviewed literature in suitable aspects of DWs. Aspects like, DW

architecture, DW design and the application of DWs in HE have been reviewed elaborately. The

aim of reviewing in this manner was to prove that HE’s Information Systems (IS) cannot survive

anymore without applying DWs.


References

[1] W. H. Inmon, Building the data warehouse, 4 ed. New York: John Wiley, 2005.
[2] R. Kimball and M. Ross, The Data Warehouse Toolkit: The Definitive Guide to
Dimensional Modeling. New York: John Wiley, 2013.
[3] T. Connolly and C. Begg, Database systems: A Practical Approach to Design,
Implementation, and Management. England: Pearson Education, 2015.
[4] A. Simitsis, "Mapping conceptual to logical models for ETL processes," presented at the
Proceedings of the 8th ACM international workshop on Data warehousing and OLAP,
Bremen, Germany, 2005.
[5] R. Kimball and J. Caserta, The Data Warehouse ETL Toolkit: Practical Techniques for
Extracting, Cleaning, Conforming, and Delivering Data. New York: John Wiley, 2004.
[6] R. Kimball and M. Ross, The Data Warehouse Toolkit: The Complete Guide to
Dimensional Modeling. New York: John Wiley, 2002.
[7] S. Chaudhuri and U. Dayal, "An overview of data warehousing and OLAP technology,"
SIGMOD Rec., vol. 26, pp. 65-74, 1997.
[8] S. Rizzi, A. Abell, J. Lechtenbörger, and J. Trujillo, "Research in data warehouse
modeling and design: dead or alive?," presented at the Proceedings of the 9th ACM
international workshop on Data warehousing and OLAP, Arlington, Virginia, USA,
2006.
[9] W. H. Inmon, Building the Data Warehouse. New York: John Wiley, 2002.
[10] A. Bonifati, P. d. Milano, F. Cattaneo, Cefriel, and S. Ceri, "Designing data marts for
data warehouses," ACM Trans. Softw. Eng. Methodol., vol. 10, pp. 452-483, 2001.
[11] J.-N. Mazón, J. Trujillo, and J. Lechtenbörger, "Reconciling requirement-driven data
warehouses with data sources via multidimensional normal forms," Data & Knowledge
Engineering, vol. 63, pp. 725-751, 12// 2007.
[12] I. Y. Song, R. Khare, and B. Dai, "SAMSTAR: a semi-automated lexical method for
generating star schemas from an entity-relationship diagram," in Proceedings of the ACM
tenth international workshop on Data warehousing and OLAP, 2007, pp. 9-16.
[13] S. Grotevant and D. Foth, "The power of multidimensional analysis (OLAP) in higher
education enterprise reporting strategies," presented at the College and University
Machine Records Conference, 1999.
[14] E. Şuşnea, "Improving Decision Making Process in Universities: A Conceptual Model of
Intelligent Decision Support System," Procedia - Social and Behavioral Sciences, vol.
76, pp. 795-800, 2013.
[15] M. Baranovic, M. Madunic, and I. Mekterovic, "Data Warehouse as a Part of the Higher
Education Information System in Croatia," presented at the Proceedings of the 25th
International Conference on Information Technology Interfaces(ITI), Cavtat, Croatia.,
2003.
[16] J. D. Porter and J. J. Rome, "Lessons from a Successful Data Warehouse
Implementation," CAUSE/EFFECT, vol. 18, pp. 43-50, 1995.
[17] T. T. Wai and A. Sint Sint, "Metadata Based Student Data Extraction from Universities
Data Warehouse," presented at the International Conference on Signal Processing
Systems, Singapore, 2009.
[18] K. Rabuzin, "The use of a data warehouse for analyzing exams at the university,"
presented at the 3rd International Conference on Data Mining and Intelligent Information
Technology Applications (ICMiA), Macao,China, 2011.
[19] P. Tanuska, O. Vlkovic, A. Vorstermans, and W. Verschelde, "The proposal of ontology
as a part of University data warehouse," presented at the 2nd International Conference on
Education Technology and Computer (ICETC) Shanghai, China, 2010.
[20] D.-P. Zhang, "A Data Warehouse Based on University Human Resource Management of
Performance Evaluation," presented at the International Forum on Information
Technology and Applications (IFITA), Chengdu, China, 2009.
[21] J.-F. Desnos, "A National Data Warehouse Project for French Universities," presented at
the 7th International Conference of European University Information Systems on The
Changing Universities - The Role of Technology, Berlin,Germany, 2001.
[22] A. Flory, P. Soupirot, and A. Tchounikine, "A Design and implementation of a data
warehouse for research administration universities," presented at the Proceedings of the
The 7th International Conference of European University Information Systems on The
Changing Universities - The Role of Technology, Berlin, Germany, 2001.
[23] R. G. Allan and D. R. May, "Data Models for a Registrar's Data Mart," presented at the
Higher Education Administrative Technology conference (CUMREC), Arlington,
Virginia,USA, 2000.
[24] J. W. Graham, "Constructing a student data warehouse," presented at the Proceedings of
the 30th annual ACM SIGUCCS conference on User services, Providence, Rhode Island,
USA, 2002.
[25] M. C. Lin, "University Data Warehouse Design Issues: A Case Study," presented at the
Proceedings of the 2001 American Society for Engineering Education Annual
Conference & Exposition Albuquerque, New Mexico,USA, 2001.
[26] I. O. Awoyelu, Scalable Distributed Data Warehouse System: Higher Educational
Institutions Data Warehouse System: LAP Lambert Academic Publishing, 2011.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy