0% found this document useful (0 votes)
60 views6 pages

Data Warehouse Implementation of Examination Datab

This document discusses data warehouse implementation using different schemas for an examination database. It proposes using a star schema to model a data warehouse for an examination automation system. The star schema would have a central fact table containing student examination results linked to dimension tables for concepts like departments, courses, subjects, and marks. This data warehouse design aims to enable decision support and analysis of examination results at different levels.

Uploaded by

Ahmad Ershad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views6 pages

Data Warehouse Implementation of Examination Datab

This document discusses data warehouse implementation using different schemas for an examination database. It proposes using a star schema to model a data warehouse for an examination automation system. The star schema would have a central fact table containing student examination results linked to dimension tables for concepts like departments, courses, subjects, and marks. This data warehouse design aims to enable decision support and analysis of examination results at different levels.

Uploaded by

Ahmad Ershad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of Computer Applications (0975 – 8887)

Volume 44– No.5, April 2012

Data Warehouse Implementation of Examination


Databases

Muheet Ahmed Butt S. M. K. Quadri Majid Zaman


Scientist Directorate of Head & Director, Scientist Directorate of
Information Technology & PG Department of Computer Information Technology &
Support Systems, University of Science, University of Kashmir, Support Systems, University of
Kashmir, Srinagar, J&K, India Srinagar, Srinagar, J&K, India Kashmir, Srinagar, J&K, India

ABSTRACT ii. it is difficult to modify the data warehouse structure if


A data warehouse is an asset for an enterprise and exists for the organization adopting the dimensional approach
the benefit of an entire enterpriseincluding business unit, changes the way in which it does business.
individual customer, Student etc.Data in a data warehouse
does not conform specifically to the preferences of any single In the normalized approach, the data in the data warehouse are
enterprise entity. Instead, it is intended to provide data to the stored following, to a degree, the Codd normalization rule.
entire enterprise in such a way that all members can use the Tables are grouped together by subject areas that reflect general
data in the warehouse throughout its lifespan [7]. This work
data categories. The main advantage of this approach is that it is
explores using the star schema forAutomation of a Data
Warehouse. An implementation of a data warehouse for an very easy to add information into the database. A disadvantage
Examination Automation System is presented as an example. of this approach is that because of the number of tables
involved, it can be difficult for both users to join data from
General Terms different sources into meaningful information and then access
Data Warehouse, Star Schema, Examination Databases, Third the information without a precise understanding of the sources
Normal Form, Normalization, Dimension, Snowflake, Joins, of data and of the data structure of the data warehouse.
Decision Support.
These approaches are not exact opposites of each other.
Keywords Dimensional approaches can involve normalizing data to a
Data Warehousing, Data Mining, Third Normal Form, Data Set. degree [12].In this paper we have implemented a Star Schema
Model of a Data Warehouse of an Central Automation of
1. INTRODUCTION Examination System catering many colleges, Departments,
A 'data warehouse' is a repository of an organization's Courses, Subjects, Subject Groups, Marks and tried to prepare
electronically stored data. Data warehouses are designed to
results notifications at various levels which will enable us to
facilitate reporting and analysis [5].This classic definition of the
data warehouse focuses on data storage. However, the means to build a build a Decision Support Database for future analysis.
retrieve and analyze data, to extract, transform and load data,
and to manage dictionary data are also considered essential The rest of the paper is organized as follows: Section 2 provides
components of a data warehousing system [11]. These the information pertaining to various Data Warehouse Schemas
operations depend more on the way the data is stored.. used with their advantages. Section 3 provides the design of an
example Data Warehouse for Examination Automation System
There are two leading approaches to storing data in a data giving detailed attribute information pertaining to the fact table.
warehouse
Section 4 provides the overall association of various
i. Dimensional approach and
ii. Normalized approach dimensional table with the fact table. Section 5 provides the
In the dimensional approach, transaction data are partitioned association of the fact Dimension of the Star Schema
into "facts", which are generally numeric transaction data, implementation for this example with other Dimensions in the
and"dimensions", which are the reference information that schema.It also provides the results of the simulations of said
gives context to the facts [9]. A key advantage of a dimensional implementation. Section 6 provides the means for aggregation
approach is that the data warehouse is easier for the user to of data present in the Star Schema Data Warehouse Design for
understand and to use. The retrieval of data from the data Decision Support Systems. Section 7 provides brief description
warehouse also tends to operate very quickly. The main about the On-line Analytical Processing (OLAP) capabilities
disadvantages of the dimensional approach are: provided by the data warehouse or data mart. Section 8
provides the brief comparison between the 3rd normal form and
i. in order to maintain the integrity of facts and star schema implementation on the same test data. Conclusions
dimensions, loading of data from different operational drawn are depicted in Section 9. Section 10 lists the references
systems is complicated, and and Appendix 1 provides the pictorial representation of the star
schema and its relationship of fact tables with other dimensions.

18
International Journal of Computer Applications (0975 – 8887)
Volume 44– No.5, April 2012

2. SCHEMAS IN DATA WAREHOUSE


A schema is a collection of database objects, including tables,  Provide a neutral schema design, independent of any
views, indexes, and synonyms. There is a variety of ways of application or data-usage considerations
arranging schema objects in the schema models designed for  May require less data-transformation than more
data warehousing. The main database Schemas are: normalized schemas such as star schemas
2.1 Star Schemas
The star schema is perhaps the simplest data warehouse 4. DATA WAREHOUSE DESIGN
schema. It is called a star schema because the entity-
relationship diagram of this schema resembles a star, with An example of a record in a fact table for an Examination
points radiating from a central table [6]. The center of the star Automation System for a University, on a single event,such as a
consists of a large fact table and the points of the star are the result of a Student at a particular session of an Academic year
dimension tables.A star query is a join between a fact table and at Under/Post Graduate Level, has been considered.
a number of dimension tables. Each dimension table is joined to
the fact table using a primary key to foreign key join, but the In addition to the fact tables Table 1, there are also dimension
dimension tables are not joined to each other. The optimizer tables in the database. These dimension tables describe the
recognizes star queries and generates efficient execution plans options to "cut" or view the data in the fact table. The star and
for them. It is not mandatory to have any foreign keys on the snowflake schemas all use more than one dimension table in
fact table for star transformation to take effect. A star join is a their database [2][3]. The records in a single dimension table
primary key to foreign key join of the dimension tables to a fact represent the levels or choices of aggregation for the given
table. The main advantages of star schemas are that they: dimension [7][17]. The classic data warehouse example used is
the Result dimension[10][12]. The records in the Result
dimension table will indicate that the fact table data can be
 Provide a direct and intuitive mapping between the aggregated by Subjects assigned, Enrollment of Students,
business entities being analyzed by end users and the Marks Obtained etc. Another dimension would be date. Using
schema design. the date dimension we would be able to analyze data by a single
 Provide highly optimized performance for typical star date or dates aggregated by month, quarter, fiscal year, calendar
queries. year, holidays, etc.
 Are widely supported by a large number of business
intelligence tools, which may anticipate or even For an Examination Automation System, a simple fact table
require that the data warehouse schema contain would have the following column variables is show in table
dimension tables. below.

Star schemas are used for both simple data marts and very large Table 1: Fact Dimension Details
data warehouses. fact_DIM
Rollno Pertaining
ROLLNO key Numeric
3. SNOWFLAKE SCHEMAS to a Session
Registration No. of
REGNO key Alphanumeric
The snowflake schema is a more complex data warehouse Student
model than a star schema, and is a type of star schema [6]. It is RESULT Calculated Result Alphanumeric
called a snowflake schema because the diagram of the schema
resembles a snowflake. Snowflake schemas normalize TOTALM Total Marks Numeric
dimensions to eliminate redundancy i. e., the dimension data
has been grouped into multiple tables instead of one large table. Showing Statue
RESGAZ Alphanumeric
While this saves space, it increases the number of dimension Result
tables and requires more foreign key joins. The result is more Session of
Sesson_ID Key Alphanumeric
complex queries and reduced query performance. The main Examination FKey
advantages of Snowflake schemas are that they: college_id Key College Code FKey Alphanumeric

 save memory space for data. dateID Key Date FKey Alphanumeric
 increases the number of dimension tables and requires course_code Key Course Opted FKey Alphanumeric
more foreign key joins.
 the result is more complex queries. facultyID Key
Faculty Opted
Alphanumeric
FKey
Subject Group
3.1Third Normal Form (3NF) groupID Key Alphanumeric
Table FKey

Third normal form modeling is a classical relational-database 5. STAR JOIN SCHEMA


modeling technique that minimizes data redundancy through
normalization [6]. When compared to a star schema, a 3NF The star join schema (also known as the star schema) is a
schema typically has a larger number of tables due to this database in which there is a single fact table and many
normalization process. 3NF schemas are typically chosen for dimension tables. These tables are not normalized. They are
large data warehouses, especially environments with significant unlike traditional operational data bases where one attempts to
data-loading requirements that are used to feed data marts and normalize the tables [10][14]. In the fact table there is one
execute long-running queries.The main advantages of 3NF segment for each dimension. The fact table uses a compound
schemas are that they:

19
International Journal of Computer Applications (0975 – 8887)
Volume 44– No.5, April 2012

key made up of the group of the dimensions. In addition, the


fact table usually contains additional variables which typically An Examination Automation System of 2500000 records in the
are additive numbers, i.e., numericfacts. In our Examination fact table with 12 column variables, totaling to 30 megabytes of
Automation System example the individual dimension table space. The memory taken by the dimension tables are depicted
would capture views by: in table below.

 Enrollment containing registration no, name and Table 2: Dimension Table records in Megabytes
parentage DIMENTION NAME SIZE
 Subjects taken by the student
Enrollment 176.388 MB
 Student enrolled in the course
Subject 0.056 MB
 Marks obtained in every subject
Marks 150.5 MB
 Date of declaration, session, year
 College information Course 0.020 MB

 Course information College 0.008 MB


 Faculty details etc. Date 0.015 MB
For the full star schema of Examination Automation System see Faculty 0.012 MB
Appendix 1 at the end of the paper.
Group 0.095 MB
6. USING THE STAR SCHEMA FOR
BUILDING DATASETS Session 0.010 MB
Total Space 327.10 MB
Users of the Examination Automation System will want tolook
at the data summarized to various levels. Joining selected
dimension tables to the fact table will provide the user with a
dataset on which to aggregate the needed information [1].
6.1 GENERATING THE FINAL RESULT
NOTIFICATION
For example, to generate the result of the student would require An algorithm was devleloped and code implemented in SQL
a the join of five tables namely Fact Table, using SQL Server Management Studio Express as Front end
EnrollmentDimension Table, CourseDimension Table , and Microsoft SQL 2005 at the back end for testing the
subject_groupsDimension Table and marksDimension Table. described schema. The results of the simulationare presented in
The resultant data file will then be aggregated by using the Proc table below.
Summary step to produce a dataset for analysis. Below is a
demonstration of this approach.
Table 3: Client Statistics for the above query resulted in the following details.

Client Statistics Information Trial 3 Trial 2 Trail 1 Average


Client Execution Time 20:25:51 20:25:46 20:25:36
Query Profile Statistics
Rows returned by SELECT statements 27453 27453 27453 27453
Network Statistics
Number of server roundtrips 3 3 3 3
TDS packets sent from client 3 3 3 3
TDS packets received from server 1974 1974 1974 1974
Bytes sent from client 2220 2220 2220 2220
Bytes received from server 8074601 8074601 8074601 8074602
Time Statistics
Client processing time 551 568 817 645.3333
Total execution time 859 861 1127 949
Wait time on server replies 308 293 310 303.6667

20
International Journal of Computer Applications (0975 – 8887)
Volume 44– No.5, April 2012

7. BUILDING THE DECISION SUPPORT Table 4: Execution Time for Result Preparation
DATABASE 40
30
Similarly, other datasets could be generated for analysis. Execution
Using the building blocks of the fact table and the various 20 Time for
dimension tables, one has thousands of ways to aggregate the
data. For expedient analysis purposes, frequently needed
10 Preparation
aggregated datasets should be created in advance for the users 0 of Result
[15][16]. Having data readily and easily available is a major Notification
tenet of data warehousing. For Examination Automation Star 3rd Normal
System, some aggregated datasets were: Schema Form

 Generating the Final Result Notification per


Subject, College, Subject Groups, Year Wise,
Table 5: Execution Time for Result Preparation
Gender etc.
 Remuneration for Paper Checkers, Checking Total Memory Utilized
Assistants and other Officials.
 Students Count by Age, Gender, Pass, Fail, 400
Reappear in subject pertaining to per college, 300
subject, year, group of subjects.
200
100
 Interests of Various of Students in Courses, 0 Total Memory
Colleges, Subjects etc and Improvements to be Utilized
made in the Education System etc. Star 3rd
 No of Students enrolled for a particular course, Schema Normal
subject, college, courses within a college, subject Form
within a college.
 Students who have passed with and without statues.
 Percentage of result, subject wise, college wise, 10. CONCLUSION
course wise, group wise.
The data warehousing technology i s gaining wide attention,
As one can see, the Star Schema lends itself well forCustom and many organizations are building data warehouses (or, data
analysis. marts) to help them in data analysis in decision for decision
support. Data Warehousing is a newly emerged field of study
8. OLAP AND DATA MINING in Computing Sciences. Due to its viz. multidisciplinary
nature, it has overlapping area of studies in three different
computing disciplines. This overlapping sometimes may
On-line Analytical Processing (OLAP) is theanalytical
cause contradictory definitions for a specific concept. To
capabilities provided by the data warehouse or data mart. One
overcome this problem of data warehousing for Examination
can view granular data or variousaggregations of data for
Automation System,it was considered for Star Schema
business analyses using graphical-user-friendly tools [4][18].
Design. In this regard various functionaldimensions of the
Data warehouse and data marts exist to answer questions and
Examination System were designed and connected to a Fact
find business opportunities. There are many ways to analyze
Transaction Dimension. Furthermore the general issues like
data using procedures such as ProcdecodeMks,
the Client Statistics and Query Design were taken up and
ProcgetResult, Procfmaster, Procrollidx, Proc Tabulate.
various Decision Support Databases weredesigned and
implemented using the same star Schema.
Finally, data mining is the name given to newer statistical
techniques used to explore voluminous data stores. These
techniques include decision trees and neural networks. These 11. REFERENCES
methods, like neural networks, can sometimes handle co- [1] A. Gupta, V. Harinarayan, and D. Quass.
linearity better than the older statistical techniques. Aggregatequery processing in data-warehousing
environments. In Proc. 21th Int. Conf. on Very Large
9. COMPARISON WITH 3RD NORMAL Data Bases, Zurich, Switzerland, 1995.
FORM [2] ACM/ IEEE-CS Joint Task Force for Computing
Curriculum 2005. “Computing Curriculum 2005”. The
A comparative study was also performed by taking same Over view report” 30 Sep, 2005
amount of test data and the observations were tabulated in the
below mentioned table. It was observed that there was a big [3] C. Fahrner, and G. Vossen. A survey of database
tradeoff between the memory and the speed in the transformations based on the Entity-Relationship model.
implementation of 3rd Normal form and Star Schema. Data & Knowledge Engineering, vol. 15, n. 3, pp. 213-
250. 1995.:
[4] CAI Yong, HE Guangsheng, “Designing Model of Data
Warehouse with OO Method [J]”, Computer Engineering
and Applications, 2003.6.
[5] Inmon, W. H., “Building the Data Warehouse”, Second
Edition, John Wiley & Sons, Inc 1996

21
International Journal of Computer Applications (0975 – 8887)
Volume 44– No.5, April 2012

[6] Fon Silvers, “Building and Maintaining a Data [13]R. Barquin, and S. Edelstein. “Planning and Designing,
Warehouse,” AN AUERBACH BOOK”, CRC Press is the Data Warehouse”,. Prentice Hall, 1996.
an imprint of the Taylor & Francis Group, an informa
business [14]REN Jinluan, GU Peiliang, ZENG Zhenxiang, “Research
on the Methods of Designing Data Structure of Data
[7] Jeff Lawyer, ShamsulChowdhury, “ Best Practices in Warehouse [J]”. Computer Engineering and
Data Warehousing to Support Business Initiatives and Applications, 2001.22.
Needs”, Proceedings of the 37th Hawaii International
Conference on System Sciences – 2004 [15]SvetlozarNestorov, NenadJukic, “Ad-Hoc Association-
Rule Mining within the Data Warehouse”, Proceedings
[8] Jorge Bernardino, Pedro Furtado, Henrique Madeira,” A of the 36th Hawaii International Conference on System
Cost Effective Approach for Very Large Data Sciences, 2002
Warehouses”, Proceedings of the International Database
Engineering and Applications Symposium, 2002 [16]Syed Najam-ul-Hassan, MaqboolUddinShaikh,
UzairIqbalJanjua,” Data Warehousing an Academic
[9] Kimball, Ralph, “The Data Warehouse Toolkit: Practical Discipline “Curriculum Development Approach,
Techniques for Building Dimensional Data Methodologies and Issues”, 2006
Warehouses”, John Wiley & Sons, Inc, 1996.
[17]Wu Shuning, Cui Deguang, Cheng Peng ,”The Four-stage
[10]Krishna. “Principles of Curriculum Design and Revision: Standardized Modeling Method in Data
A Case Study in Implementing Computing Curricula
CC2001”. ITiCSE’05, June 27–29, 2005 [18]Warehouse System Development” Proceedings of the
IEEE International Conference on Mechatronics &
[11]Larry, Greenfield, LGI Systems Inc., "The Data Automation Niagara Falls, Canada • July 2005
Warehousing Information Center," 1997 pp
http://pwp.starnetinc.com/larryg/index.html. [19]YUAN Hong, HE Houcun, “Online Analysis and Data
Warehouse Modeling Technologies [J]”, Computer
[12]LIN Yu,etc, “The Principles and Applications of Data Application Research, 1999.12.
Warehouse [M]”, Posts & Telecommunications Press,
2003.1

22
International Journal of Computer Applications (0975 – 8887)
Volume 44– No.5, April 2012

Appendix 1

subject_DIM
NAME
CODE
THMAX
THMIN
PRMAX

sess_DIM PRMIN
group_DIM
session_id PRMINI
groupID
session_des PRMINE
group_des
year NTP
sub1
sub2

college_DIM
college_name
faculty_DIM
college_id
facultyID
yearEstab
faculty_desc fact_DIM
street
ROLLNO
district
REGNO
contactno
RESULT
mks_DIM TOTALM
ROLLNO RESGAZ
SUBJECT sesson_ID
SUB1 college_id
MK1A dateID
CD1A course_code
MK1B facultyID
CD1B groupID
MK1C
enroll_DIM
MK1 MM
MK1S REGNO
PMK1I SNO
PMK1E date_DIM COLLEGE
PMK1 date_ID
NAME
PEST1 date
FNAME
PST1 FullDateDesc
dayofWeek
CalendarMonth

course_DIM
course_code
course_des
noofyears

23

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy