0% found this document useful (0 votes)
7 views4 pages

Zhang 2013

This paper discusses the application of Document-Oriented NoSQL Database technology, specifically Apache CouchDB, for managing software project documents in a web-based collaborative environment. It highlights the limitations of traditional relational databases in handling large amounts of distributed data and presents a prototype application that allows users to efficiently access, upload, and download project documents. The research emphasizes the advantages of CouchDB, such as scalability and flexibility, while also noting some drawbacks related to query complexity and security management.

Uploaded by

Jhoel Cosio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

Zhang 2013

This paper discusses the application of Document-Oriented NoSQL Database technology, specifically Apache CouchDB, for managing software project documents in a web-based collaborative environment. It highlights the limitations of traditional relational databases in handling large amounts of distributed data and presents a prototype application that allows users to efficiently access, upload, and download project documents. The research emphasizes the advantages of CouchDB, such as scalability and flexibility, while also noting some drawbacks related to query complexity and security management.

Uploaded by

Jhoel Cosio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Third International Conference on Information Science and Technology

March 23-25, 2013; Yangzhou, Jiangsu, China

Application of Document-Oriented NoSQL Database Technology in


Web-based Software Project Documents Management System
Shaolong Zhang

Abstract—In the development of software project system feature of NoSQL systems is “shared nothing” horizontal
architectures, software programmers, product test engineers scaling – replicating and partitioning data over many servers.
and clients who may be distributed over the world need good This allows them to support a large number of simple
communication and collaboration. They require easy access to read/write operations per second.
necessary data and sharing of relevant documents of the project In the past few years, many scalable NoSQL related
which are distributed in a network of resources and users. distributed storage systems have been proposed, e.g.
Hence, web-based applications, decentralized repositories and
Google’s Bigtable [3], Amazon’s Dynamo [4] and Yahoo’s
databases are needed to store and manage project and process
development information. However relational databases work PNUTS [5].
poorly in performance, scalability and reliability while Unlike relational (SQL) DBMSs, NoSQL systems have no
processing large amounts of big data spread out across many unique data model. There are four popular data storage types
servers. This paper works for managing software project according their data model: Key-value Stores, Document
documents in an Internet-based collaborative environment Stores, Extensible Record Stores and Relational Databases
using NoSQL database technology. The research work focus on [6]. Document stores support more complex data than the
the development of a web-based application using Apache others.
CouchDB which is an open source document-oriented database The term “document” can be any kind of “pointerless
with REST web services interface. The proposed application is
object” including texts, Microsoft Word files, Video files,
described and the main functionalities are illustrated through
the examples of use. Microsoft PowerPoint files, etc. Since we aim to deal with all
kinds of project documents, document-oriented NoSQL
I. INTRODUCTION database would be the best choice for our wok. In our
research, we implement the storage system based on Apache
I N the development of software project a lot of project
documents such as software requirements document,
software design documents and project progress reports
CouchDB which is an open source document-oriented
database supported by Apache project.
The rest of this paper is organized as follows: Section 2
need to be archived and shared between stakeholders like gives a brief description of the basic model of Apache
system architectures, software programmers, product test CouchDB [7]. The system architecture and the software
engineers and clients. The sharing activity can be enhanced design are illustrated in Section 3 and we show the result of
through the use of computer and communication technology our implementation in section 4. The conclusion and analyses
especially through the Internet which can be considered the the advantages and disadvantages compared with relational
best channel for collaboration and knowledge exchange. The database are proposed in Section 5.
distributed storage system plays the most important role
which needs good scalability for simple read/write operation II. OVERVIEW OF APACHE COUCHDB
with big data files. However relational databases such as
MySQL, MS SQLServer and Oracle store data in the form of Apache CouchDB is a scalable, fault-tolerant, and
two-dimensional tables which are good at storing highly schema-free document-oriented database written in Erlang.
structured and interrelated data but works poorly in CouchDB's reliability and scalability is further enhanced by
performance, scalability and reliability while processing large being implemented in the Erlang programming language
amounts of big data spread out across many servers[1,2]. which used to build massively scalable soft real-time systems
In this paper, we propose a new approach to manage with requirements on high availability[8]. In CouchDB data is
distributed software project documents which based on stored in documents, presented in key-value maps using the
Document-oriented Database (DoDB) instead of the usual data types from JavaScript Oriented Notation (JSON) .
relational database. Our work implemented a Web-based A. Documents And Views In CouchDB
prototype application which is convenient for users to
A CouchDB server hosts named databases, which store
manage software project documents in the collaborative
documents. A CouchDB document is an object that consists
distributed environment.
of named fields. Field values may be strings, numbers, dates,
Document-oriented Database is one of kind of NoSQL.
or even ordered lists and associative maps. A CouchDB
The definition of NoSQL, which stands for “Not Only SQL”
document is simply a presented as a JSON object with some
or “Not Relational”, is not entirely agreed upon. The key
associated metadata. Documents can have attachments just
like email. All kinds of files can be attached to a document.
S Zhang is with the Network Center, Zhejiang Radio & Television CouchDB is designed to store and report on large amounts
University, Hangzhou, Zhejiang, 310012, China (e-mail:
zhangsl@zjtvu.edu.cn). of semi-structured, document oriented data. With CouchDB,

978-1-4673-2764-0/13/$31.00 ©2013 IEEE 504


no schema is enforced, so new document types with new
meaning can be safely added alongside the old. CouchDB
greatly simplifies the development of document oriented
applications, which make up the bulk of collaborative web
applications.
Queries are done with what CouchDB calls “views”, which
are defined with JavaScript to specify field constraints.
Queries can be distributed in parallel over multiple nodes
using a map-reduce mechanism.
B. Asynchronous Replication Of CouchDB
CouchDB is a peer based distributed database system. Any
number of CouchDB servers can have independent “replica
copies” of the same database, where applications have full
database interactivity (query, add, edit, delete). When back Fig. 1. Distributed system architecture based on CouchDB
online or on a schedule, database changes are replicated We could deploy any number of database servers. Only one
bi-directionally. of them could be master database server, the others are slave
CouchDB has built-in conflict detection and management database servers.
and the replication process is incremental and fast, copying Replication between servers is triggered by sending a
only documents and individual fields changed since the POST request to the _replicate URL with a JSON object in
previous replication. CouchDB will notify an application if the body that includes a source and a target member.
someone else has updated the document since it was fetched. POST /_replicate HTTP/1.1
The application can then try to combine the updates, or can {"source":"example-database","target":"http://example.or
just retry its update and overwrite. g/example-database"}
C. HTTP REST Web Services In the next sections, we will describe in detail the design of
database and Web client.
CouchDB provides a set of RESTful HTTP APIs for
reading and updating (add, edit, delete) database documents. B. Entity Relationships Model Of Database
All items in database have a unique URI that gets exposed via Before we implement the database, we need design the
HTTP. REST uses the HTTP methods POST, GET, PUT and entity relationships model of the system. In our system the
DELETE for the four basic CRUD (Create, Read, Update, core entities are users, projects, project items and project
Delete) operations on all resources. documents. The relationships between them could be
With the powerful features of CouchDB, it is possible to described as follows:
build a distributed document database management 1) Every user belongs to a software project and each item
application that hold digitized information and replications belongs to a software project.
including text, images, audios or videos. 2) For each item, only users of the project have rights to
In the next section the system architecture is presented add/update documents in this item.
which aims at supporting our purpose. 3) Each item could have any number of software project
documents and each document only belongs to one item.
III. SYSTEM DESIGN The E-R diagram is represented as following figure2.
A. System Architecture
The goal of our work is to provide a Web-based application
which is convenient for users to access software project
documents on multiple storage servers. Users from anywhere
could log on the local node server in the system and download
or upload project documents in the system.
The system is based on distributed CouchDB DoDB.
CouchDB works as a database server and a Web Application
server which provide RESTful HTTP APIs for client
development. CouchDB supports many programming
languages including C, C#, Java, JavaScript. In this work, we
implement client application in JavaScript which is easy to Fig. 2. E-R diagram
connect server with an AJAX technology and deal with JSON
object returned from CouchDB. A schematic representation The relations between these entities are one-to-many type.
of how the system has been implemented is shown in Figure In CouchDB, there are two ways to achieve this type:
1. 1) Use separate relationship documents,
2) Use an embedded array in a document.

505
Since the embedded array is only an option for not too optional reduce member to hold the view functions.
many items to store and bigger documents mean slower In our work a design document that defines user_project,
handling and slower network transfers. We choose the all_items and documents_of_item views might look like this:
separate relationship documents way. {
Some sample documents in a typical case are: "_id": "_design/dbs",
{ "_rev": "10-de6a28a0d90a01c4b42428d658f211c7",
"_id": "a_case_of_project", "views": {
"_rev": "ad2d1c186f853", " user_project ": {
"type": " project", "map": "function(doc){ if (doc.type == 'user')
"name": "E-Learing", emit(doc.user_id, doc.project_id)}"},
" project_id": "E-123480" "all_items": {
} "map": "function(doc){ if (doc.type == 'item')
{ emit(doc. project_id ,{item_id:doc.item_id,
"_id": "a_case_of_user", item_name:doc.item_name})}"},
"_rev": "2f7d520c0d628a", " documents_of_item": {
"type": "user", "map": "function(doc){ if (doc.type == 'document')
"name": "Steve Zhang", emit(doc.item_id ,doc)}"}
"user _id": "ee001", }
" project_id": "E-123480" }
} With URL query arguments, we can use views to get data
{ we need. For example, we can retrieve all items of the project
"_id": " a_case_of_item", from the URL like
"_rev": "0d36967e465", http://example.com:5984/test/_design/dbs/_view/all_items?
"type": "item ", key=" E-Learing".
"item _name": "Poject Requirements",
D. The Implementation Of Web Client
"item _id": "sys_reqs",
"item_description": "any project requirement documents", Although CouchDB can work as both Web Server and
" project_id": "E-123480" database server where HTML pages and JavaScript files can
} be stored, it is a weaker Web Server than other professional
{ Web Servers such as Apache HTTP server and MS IIS. In our
"_id": " a_case_of_document ", work we use Apache HTTP server as a reverse Proxy for
"_rev": "f0c0926e7b51b", CouchDB. Browsers will typically enforce the same origin
"type": " document", policy and reject requests to fetch data unless the protocol,
" document_type": "DOC", port and host are identical to the source of the current page.
"document_name": "project hardware requirement", Using a reverse proxy allows browser-hosted applications to
" document_id": "dc001", access CouchDB while conforming to the same origin policy.
"item_id": " intros", The whole process could be described with a UML
"_attachments": { sequence diagram.
"project hardware requirement.doc": {
"content_type": "application/vnd.ms-word",
"revpos": 2,
"digest": "md5-zFFMEjuhk/5Fe7pndcFNhA==",
"length": 54800,
"stub": true
}
}
}
C. Database View Design
Unlike SQL for relational database, CouchDB use views
for querying and reporting on CouchDB documents. Views
are the method of aggregating and reporting on the
documents.
Fig. 3. UML Sequence diagram
CouchDB uses Google’s MapReduce programming model
[9] for views. Views are defined by a JavaScript function that When users open the HTML page, events will be triggered
maps view keys to values. To create a view, the functions automatically or by user. Event function written in JavaScript
must first be saved into special design documents. The IDs of sends REST request intercepted by the Ajax engine with
design documents must begin with _design/ and have a XMLHttpRequst object. The Web server will accept and
special views attribute that have a map member and an forward the request to CouchDB and return back data to client

506
while in the data transfer users can handle other things. project documents management. Compared with relational
database, application based on CouchDB could take
IV. CURRENT PROTOTYPE advantages of efficient storage of big data files and data
At this stage of the work, we managed to produce a simple replication over many servers. The peer-to-peer based
prototype of a web application that is able to access software distributed infrastructure of CouchDB also provides elastic
project documents stored in CouchDB under a distributed scalability which means that we can add capacity easily by
network environment. The application is designed to allow adding more servers.
users log on the system to download or upload any documents However some disadvantages of NoSQL also could be
for sharing. found. NoSQL databases have no unique query language like
The user logs on the system in the local node server where SQL for relation database. Each NoSQL database such as
all items are listed, then the user can click one item to see the CouchDB provides different query language which would be
all the documents information of the item. Attachments are difficult for complex query programming. NoSQL databases
also shown when they are documents like ppt files, word files, also have weaker authentication and security management
images, or video files. compared with relation database like Oracle and MS
SQLServer. For the case of CouchDB, it only provides
administrator privilege to manipulate database.
Whatever, NoSQL databases like CouchDB shows their
potentiality in massive data storage under distributed network
environment which would be more appropriate than relation
database in Web 2.0 age.

REFERENCES
[1] M. Stonebraker, “SQL Databases v. NoSQL Databases”,
Communications of the ACM, vol. 53, no. 4, pp.10-11, April. 2010.
[2] N. Leavitt, “Will NoSQL Databases Live Up to Their Promise?”,
Computer, vol. 43, no. 2, pp.12-14, February. 2010.
[3] F.Chang et al,“BigTable:A Distributed Storage System for Structured
Data”, in Seventh Symposium on Operating System Design and
Implementation, Seattle, USA, 2006, pp.205-218.
[4] B.DeCandia et al,“Dynamo:Amazon’s Highly Available Key-Value
Store”, in Proceedings 21st ACM SIGOPS Symposium on Operating
Systems Principles, Stevenson, USA, 2007, pp.205-220.
[5] B. F. Cooper et al., “PNUTS: Yahoo!’s Hosted Data Serving Platform”,
in Proceedings of the VLDB Endowment, Auckland, New Zealand,
2008, pp.1277-1288.
[6] C. Rick, “Scalable SQL and NoSQL Data Stores.” ACM SIGMOD
Fig. 4. Page of documents management
Record, vol. 39, no. 4, pp.12-27, December.2010.
[7] Apache Software Foundation. Apache CouchDB: The Apache
The user also could download or add/delete documents of CouchDB Project. [Online].Available: http://couchdb.apache.org/.
this item. [8] Erlang.org. Erlang programming language. [Online].Available:
http://www.erlang.org/
[9] J. Dean, and S. Ghemawat, “MapReduce: simplified data
V. CONCLUSION processing on large clusters”, Communications of the ACM, vol. 51,
In this work we explore the use of recent NoSQL database no. 1, pp. 107-113, January.2008.
technologies, namely Apache CouchDB DBMS and REST
web services to develop a system for distributed software

507

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy