Zhang 2013
Zhang 2013
Abstract—In the development of software project system feature of NoSQL systems is “shared nothing” horizontal
architectures, software programmers, product test engineers scaling – replicating and partitioning data over many servers.
and clients who may be distributed over the world need good This allows them to support a large number of simple
communication and collaboration. They require easy access to read/write operations per second.
necessary data and sharing of relevant documents of the project In the past few years, many scalable NoSQL related
which are distributed in a network of resources and users. distributed storage systems have been proposed, e.g.
Hence, web-based applications, decentralized repositories and
Google’s Bigtable [3], Amazon’s Dynamo [4] and Yahoo’s
databases are needed to store and manage project and process
development information. However relational databases work PNUTS [5].
poorly in performance, scalability and reliability while Unlike relational (SQL) DBMSs, NoSQL systems have no
processing large amounts of big data spread out across many unique data model. There are four popular data storage types
servers. This paper works for managing software project according their data model: Key-value Stores, Document
documents in an Internet-based collaborative environment Stores, Extensible Record Stores and Relational Databases
using NoSQL database technology. The research work focus on [6]. Document stores support more complex data than the
the development of a web-based application using Apache others.
CouchDB which is an open source document-oriented database The term “document” can be any kind of “pointerless
with REST web services interface. The proposed application is
object” including texts, Microsoft Word files, Video files,
described and the main functionalities are illustrated through
the examples of use. Microsoft PowerPoint files, etc. Since we aim to deal with all
kinds of project documents, document-oriented NoSQL
I. INTRODUCTION database would be the best choice for our wok. In our
research, we implement the storage system based on Apache
I N the development of software project a lot of project
documents such as software requirements document,
software design documents and project progress reports
CouchDB which is an open source document-oriented
database supported by Apache project.
The rest of this paper is organized as follows: Section 2
need to be archived and shared between stakeholders like gives a brief description of the basic model of Apache
system architectures, software programmers, product test CouchDB [7]. The system architecture and the software
engineers and clients. The sharing activity can be enhanced design are illustrated in Section 3 and we show the result of
through the use of computer and communication technology our implementation in section 4. The conclusion and analyses
especially through the Internet which can be considered the the advantages and disadvantages compared with relational
best channel for collaboration and knowledge exchange. The database are proposed in Section 5.
distributed storage system plays the most important role
which needs good scalability for simple read/write operation II. OVERVIEW OF APACHE COUCHDB
with big data files. However relational databases such as
MySQL, MS SQLServer and Oracle store data in the form of Apache CouchDB is a scalable, fault-tolerant, and
two-dimensional tables which are good at storing highly schema-free document-oriented database written in Erlang.
structured and interrelated data but works poorly in CouchDB's reliability and scalability is further enhanced by
performance, scalability and reliability while processing large being implemented in the Erlang programming language
amounts of big data spread out across many servers[1,2]. which used to build massively scalable soft real-time systems
In this paper, we propose a new approach to manage with requirements on high availability[8]. In CouchDB data is
distributed software project documents which based on stored in documents, presented in key-value maps using the
Document-oriented Database (DoDB) instead of the usual data types from JavaScript Oriented Notation (JSON) .
relational database. Our work implemented a Web-based A. Documents And Views In CouchDB
prototype application which is convenient for users to
A CouchDB server hosts named databases, which store
manage software project documents in the collaborative
documents. A CouchDB document is an object that consists
distributed environment.
of named fields. Field values may be strings, numbers, dates,
Document-oriented Database is one of kind of NoSQL.
or even ordered lists and associative maps. A CouchDB
The definition of NoSQL, which stands for “Not Only SQL”
document is simply a presented as a JSON object with some
or “Not Relational”, is not entirely agreed upon. The key
associated metadata. Documents can have attachments just
like email. All kinds of files can be attached to a document.
S Zhang is with the Network Center, Zhejiang Radio & Television CouchDB is designed to store and report on large amounts
University, Hangzhou, Zhejiang, 310012, China (e-mail:
zhangsl@zjtvu.edu.cn). of semi-structured, document oriented data. With CouchDB,
505
Since the embedded array is only an option for not too optional reduce member to hold the view functions.
many items to store and bigger documents mean slower In our work a design document that defines user_project,
handling and slower network transfers. We choose the all_items and documents_of_item views might look like this:
separate relationship documents way. {
Some sample documents in a typical case are: "_id": "_design/dbs",
{ "_rev": "10-de6a28a0d90a01c4b42428d658f211c7",
"_id": "a_case_of_project", "views": {
"_rev": "ad2d1c186f853", " user_project ": {
"type": " project", "map": "function(doc){ if (doc.type == 'user')
"name": "E-Learing", emit(doc.user_id, doc.project_id)}"},
" project_id": "E-123480" "all_items": {
} "map": "function(doc){ if (doc.type == 'item')
{ emit(doc. project_id ,{item_id:doc.item_id,
"_id": "a_case_of_user", item_name:doc.item_name})}"},
"_rev": "2f7d520c0d628a", " documents_of_item": {
"type": "user", "map": "function(doc){ if (doc.type == 'document')
"name": "Steve Zhang", emit(doc.item_id ,doc)}"}
"user _id": "ee001", }
" project_id": "E-123480" }
} With URL query arguments, we can use views to get data
{ we need. For example, we can retrieve all items of the project
"_id": " a_case_of_item", from the URL like
"_rev": "0d36967e465", http://example.com:5984/test/_design/dbs/_view/all_items?
"type": "item ", key=" E-Learing".
"item _name": "Poject Requirements",
D. The Implementation Of Web Client
"item _id": "sys_reqs",
"item_description": "any project requirement documents", Although CouchDB can work as both Web Server and
" project_id": "E-123480" database server where HTML pages and JavaScript files can
} be stored, it is a weaker Web Server than other professional
{ Web Servers such as Apache HTTP server and MS IIS. In our
"_id": " a_case_of_document ", work we use Apache HTTP server as a reverse Proxy for
"_rev": "f0c0926e7b51b", CouchDB. Browsers will typically enforce the same origin
"type": " document", policy and reject requests to fetch data unless the protocol,
" document_type": "DOC", port and host are identical to the source of the current page.
"document_name": "project hardware requirement", Using a reverse proxy allows browser-hosted applications to
" document_id": "dc001", access CouchDB while conforming to the same origin policy.
"item_id": " intros", The whole process could be described with a UML
"_attachments": { sequence diagram.
"project hardware requirement.doc": {
"content_type": "application/vnd.ms-word",
"revpos": 2,
"digest": "md5-zFFMEjuhk/5Fe7pndcFNhA==",
"length": 54800,
"stub": true
}
}
}
C. Database View Design
Unlike SQL for relational database, CouchDB use views
for querying and reporting on CouchDB documents. Views
are the method of aggregating and reporting on the
documents.
Fig. 3. UML Sequence diagram
CouchDB uses Google’s MapReduce programming model
[9] for views. Views are defined by a JavaScript function that When users open the HTML page, events will be triggered
maps view keys to values. To create a view, the functions automatically or by user. Event function written in JavaScript
must first be saved into special design documents. The IDs of sends REST request intercepted by the Ajax engine with
design documents must begin with _design/ and have a XMLHttpRequst object. The Web server will accept and
special views attribute that have a map member and an forward the request to CouchDB and return back data to client
506
while in the data transfer users can handle other things. project documents management. Compared with relational
database, application based on CouchDB could take
IV. CURRENT PROTOTYPE advantages of efficient storage of big data files and data
At this stage of the work, we managed to produce a simple replication over many servers. The peer-to-peer based
prototype of a web application that is able to access software distributed infrastructure of CouchDB also provides elastic
project documents stored in CouchDB under a distributed scalability which means that we can add capacity easily by
network environment. The application is designed to allow adding more servers.
users log on the system to download or upload any documents However some disadvantages of NoSQL also could be
for sharing. found. NoSQL databases have no unique query language like
The user logs on the system in the local node server where SQL for relation database. Each NoSQL database such as
all items are listed, then the user can click one item to see the CouchDB provides different query language which would be
all the documents information of the item. Attachments are difficult for complex query programming. NoSQL databases
also shown when they are documents like ppt files, word files, also have weaker authentication and security management
images, or video files. compared with relation database like Oracle and MS
SQLServer. For the case of CouchDB, it only provides
administrator privilege to manipulate database.
Whatever, NoSQL databases like CouchDB shows their
potentiality in massive data storage under distributed network
environment which would be more appropriate than relation
database in Web 2.0 age.
REFERENCES
[1] M. Stonebraker, “SQL Databases v. NoSQL Databases”,
Communications of the ACM, vol. 53, no. 4, pp.10-11, April. 2010.
[2] N. Leavitt, “Will NoSQL Databases Live Up to Their Promise?”,
Computer, vol. 43, no. 2, pp.12-14, February. 2010.
[3] F.Chang et al,“BigTable:A Distributed Storage System for Structured
Data”, in Seventh Symposium on Operating System Design and
Implementation, Seattle, USA, 2006, pp.205-218.
[4] B.DeCandia et al,“Dynamo:Amazon’s Highly Available Key-Value
Store”, in Proceedings 21st ACM SIGOPS Symposium on Operating
Systems Principles, Stevenson, USA, 2007, pp.205-220.
[5] B. F. Cooper et al., “PNUTS: Yahoo!’s Hosted Data Serving Platform”,
in Proceedings of the VLDB Endowment, Auckland, New Zealand,
2008, pp.1277-1288.
[6] C. Rick, “Scalable SQL and NoSQL Data Stores.” ACM SIGMOD
Fig. 4. Page of documents management
Record, vol. 39, no. 4, pp.12-27, December.2010.
[7] Apache Software Foundation. Apache CouchDB: The Apache
The user also could download or add/delete documents of CouchDB Project. [Online].Available: http://couchdb.apache.org/.
this item. [8] Erlang.org. Erlang programming language. [Online].Available:
http://www.erlang.org/
[9] J. Dean, and S. Ghemawat, “MapReduce: simplified data
V. CONCLUSION processing on large clusters”, Communications of the ACM, vol. 51,
In this work we explore the use of recent NoSQL database no. 1, pp. 107-113, January.2008.
technologies, namely Apache CouchDB DBMS and REST
web services to develop a system for distributed software
507