Document-Oriented Database
Document-Oriented Database
A document-oriented database, or document store, is a computer program and data storage system
designed for storing, retrieving and managing document-oriented information, also known as semi-
structured data.[1]
Document-oriented databases are one of the main categories of NoSQL databases, and the popularity of the
term "document-oriented database" has grown[2] with the use of the term NoSQL itself. XML databases are
a subclass of document-oriented databases that are optimized to work with XML documents. Graph
databases are similar, but add another layer, the relationship, which allows them to link documents for rapid
traversal.
Document-oriented databases are inherently a subclass of the key-value store, another NoSQL database
concept. The difference lies in the way the data is processed; in a key-value store, the data is considered to
be inherently opaque to the database, whereas a document-oriented system relies on internal structure in the
document in order to extract metadata that the database engine uses for further optimization. Although the
difference is often negligible due to tools in the systems,[a] conceptually the document-store is designed to
offer a richer experience with modern programming techniques.
Document databases[b] contrast strongly with the traditional relational database (RDB). Relational
databases generally store data in separate tables that are defined by the programmer, and a single object
may be spread across several tables. Document databases store all information for a given object in a single
instance in the database, and every stored object can be different from every other. This eliminates the need
for object-relational mapping while loading data into the database.
Documents
The central concept of a document-oriented database is the notion of a document. While each document-
oriented database implementation differs on the details of this definition, in general, they all assume
documents encapsulate and encode data (or information) in some standard format or encoding. Encodings
in use include XML, YAML, JSON, as well as binary forms like BSON.
Documents in a document store are roughly equivalent to the programming concept of an object. They are
not required to adhere to a standard schema, nor will they have all the same sections, slots, parts or keys.
Generally, programs using objects have many different types of objects, and those objects often have many
optional fields. Every object, even those of the same class, can look very different. Document stores are
similar in that they allow different types of documents in a single store, allow the fields within them to be
optional, and often allow them to be encoded using different encoding systems. For example, the following
is a document, encoded in JSON:
{
"FirstName": "Bob",
"Address": "5 Oak St.",
"Hobby": "sailing"
}
These two documents share some structural elements with one another, but each also has unique elements.
The structure and text and other data inside the document are usually referred to as the document's content
and may be referenced via retrieval or editing methods, (see below). Unlike a relational database where
every record contains the same fields, leaving unused fields empty; there are no empty 'fields' in either
document (record) in the above example. This approach allows new information to be added to some
records without requiring that every other record in the database share the same structure.
Document databases typically provide for additional metadata to be associated with and stored along with
the document content. That metadata may be related to facilities the datastore provides for organizing
documents, providing security, or other implementation specific features.
CRUD operations
The core operations that a document-oriented database supports for documents are similar to other
databases, and while the terminology is not perfectly standardized, most practitioners will recognize them as
CRUD:
Keys
Documents are addressed in the database via a unique key that represents that document. This key is a
simple identifier (or ID), typically a string, a URI, or a path. The key can be used to retrieve the document
from the database. Typically the database retains an index on the key to speed up document retrieval, and in
some cases the key is required to create or insert the document into the database.
Retrieval
Another defining characteristic of a document-oriented database is that, beyond the simple key-to-document
lookup that can be used to retrieve a document, the database offers an API or query language that allows
the user to retrieve documents based on content (or metadata). For example, you may want a query that
retrieves all the documents with a certain field set to a certain value. The set of query APIs or query
language features available, as well as the expected performance of the queries, varies significantly from
one implementation to another. Likewise, the specific set of indexing options and configuration that are
available vary greatly by implementation.
It is here that the document store varies most from the key-value store. In theory, the values in a key-value
store are opaque to the store, they are essentially black boxes. They may offer search systems similar to
those of a document store, but may have less understanding about the organization of the content.
Document stores use the metadata in the document to classify the content, allowing them, for instance, to
understand that one series of digits is a phone number, and another is a postal code. This allows them to
search on those types of data, for instance, all phone numbers containing 555, which would ignore the zip
code 55555.
Editing
Document databases typically provide some mechanism for updating or editing the content (or metadata) of
a document, either by allowing for replacement of the entire document, or individual structural pieces of the
document.
Organization
Document database implementations offer a variety of ways of organizing documents, including notions of
Sometimes these organizational notions vary in how much they are logical vs physical, (e.g. on disk or in
memory), representations.
A document-oriented database is a specialized key-value store, which itself is another NoSQL database
category. In a simple key-value store, the document content is opaque. A document-oriented database
provides APIs or a query/update language that exposes the ability to query or update based on the internal
structure in the document. This difference may be minor for users that do not need richer query, retrieval, or
editing APIs that are typically provided by document databases. Modern key-value stores often include
features for working with metadata, blurring the lines between document stores.
In a relational database, data is first categorized into a number of predefined types, and tables are created to
hold individual entries, or records, of each type. The tables define the data within each record's fields,
meaning that every record in the table has the same overall form. The administrator also defines the
relationships between the tables, and selects certain fields that they believe will be most commonly used for
searching and defines indexes on them. A key concept in the relational design is that any data that may be
repeated is normally placed in its own table, and if these instances are related to each other, a column is
selected to group them together, the foreign key. This design is known as database normalization.[3]
For example, an address book application will generally need to store the contact name, an optional image,
one or more phone numbers, one or more mailing addresses, and one or more email addresses. In a
canonical relational database, tables would be created for each of these rows with predefined fields for each
bit of data: the CONTACT table might include FIRST_NAME, LAST_NAME and IMAGE columns,
while the PHONE_NUMBER table might include COUNTRY_CODE, AREA_CODE,
PHONE_NUMBER and TYPE (home, work, etc.). The PHONE_NUMBER table also contains a foreign
key column, "CONTACT_ID", which holds the unique ID number assigned to the contact when it was
created. In order to recreate the original contact, the database engine uses the foreign keys to look for the
related items across the group of tables and reconstruct the original data.
In contrast, in a document-oriented database there may be no internal structure that maps directly onto the
concept of a table, and the fields and relationships generally don't exist as predefined concepts. Instead, all
of the data for an object is placed in a single document, and stored in the database as a single entry. In the
address book example, the document would contain the contact's name, image, and any contact info, all in
a single record. That entry is accessed through its key, which allows the database to retrieve and return the
document to the application. No additional work is needed to retrieve the related data; all of this is returned
in a single object.
A key difference between the document-oriented and relational models is that the data formats are not
predefined in the document case. In most cases, any sort of document can be stored in any database, and
those documents can change in type and form at any time. If one wishes to add a COUNTRY_FLAG to a
CONTACT, this field can be added to new documents as they are inserted, this will have no effect on the
database or the existing documents already stored. To aid retrieval of information from the database,
document-oriented systems generally allow the administrator to provide hints to the database to look for
certain types of information. These work in a similar fashion to indexes in the relational case. Most also
offer the ability to add additional metadata outside of the content of the document itself, for instance,
tagging entries as being part of an address book, which allows the programmer to retrieve related types of
information, like "all the address book entries". This provides functionality similar to a table, but separates
the concept (categories of data) from its physical implementation (tables).
In the classic normalized relational model, objects in the database are represented as separate rows of data
with no inherent structure beyond that given to them as they are retrieved. This leads to problems when
trying to translate programming objects to and from their associated database rows, a problem known as
object-relational impedance mismatch.[4] Document stores more closely, or in some cases directly, map
programming objects into the store. These are often marketed using the term NoSQL.
Implementations
Languages RESTful
Name Publisher License Notes
supported API
Aerospike is a flash-
C, C#, Java, optimized and in-
Scala, Python, memory distributed key
AGPL and Node.js, PHP,
Aerospike Aerospike
Proprietary Go, Rust,
value NoSQL database Yes[6]
which also supports a
Spring document store
Framework
model.[5]
Commonly used in
InterSystems Java, C#, Health, Business and
Caché Proprietary Yes
Corporation Node.js Government
applications.
Distributed database
service based on
BigCouch, the
Cloudant, Erlang, Java, company's open source
Cloudant Proprietary Yes
Inc. Scala, and C fork of the Apache-
backed CouchDB
project. Uses JSON
model.
C, C#, Java,
Python,
Distributed NoSQL
Node.js, PHP,
Couchbase, Document Database,
Couchbase Server
Inc.
Apache License SQL, Go,
JSON model and SQL Yes[9]
Spring
based Query Language.
Framework,
LINQ
JSON over
REST/HTTP with Multi-
Any language Version Concurrency
Apache
that can make
CouchDB Software Apache License
HTTP
Control and limited Yes[11]
Foundation ACID properties. Uses
requests map and reduce for
views and queries.[10]
Use familiar SQL
syntax for real time
distributed queries
CRATE across a cluster. Based
CrateIO Technology Apache License Java on Lucene / Yes[12]
GmbH Elasticsearch
ecosystem with built-in
support for binary
objects (BLOBs).
Platform-as-a-Service
C#, Java,
offering, part of the
Python,
Microsoft Azure
Cosmos DB Microsoft Proprietary Node.js, Yes
platform. Builds upon
JavaScript,
and extends the earlier
SQL
Azure DocumentDB.
fully managed
Amazon Web Proprietary MongoDB v3.6-
DocumentDB various, REST Yes
Services online service compatible database
service
Java,
JavaScript,
Node.js, Go, fully managed
C# .NET, Perl, proprietary NoSQL
Amazon Web PHP, Python, database service that
DynamoDB Proprietary Yes
Services Ruby, Rust, supports key–value and
Haskell, document data
Erlang, structures
Django, and
Grails
Java Content
Apache
Jackrabbit Apache License Java Repository ?
Foundation
implementation
LotusScript,
HCL Notes (HCL Java, Notes
HCL Proprietary MultiValue Yes
Domino) Formula
Language
Distributed document-
oriented database for
Java,
JSON, XML, and RDF
Free Developer JavaScript,
triples. Built-in full-text
MarkLogic license or Node.js,
MarkLogic search, ACID Yes
Corporation XQuery,
Commercial[15] SPARQL,
transactions, high
availability and disaster
XSLT, C++
recovery, certified
security.
Shared nothing,
horizontally scalable
database with support
C, C#, Java,
Oracle NoSQL Apache and for schema-less JSON,
Oracle Corp Python, Yes
Database proprietary fixed schema tables,
node.js, Go
and key/value pairs.
Also supports ACID
transactions.
Qizx Qualcomm Proprietary REST, Java, Distributed document- Yes
XQuery, XSLT, oriented XML database
C, C++, with integrated full-text
Python search; support for
JSON, text, and
binaries.
See also
Database theory
Data hierarchy
Data analysis
Full-text search
In-memory database
Internet Message Access Protocol (IMAP)
Machine-readable document
Multi-model database
NoSQL
Object database
Online database
Real-time database
Relational database
Content management system
Notes
a. To the point that document-oriented and key-value systems can often be interchanged in
operation.
b. And key-value stores in general.
References
1. Drake, Mark (9 August 2019). "A Comparison of NoSQL Database Management Systems
and Models" (https://web.archive.org/web/20190813163612/https://www.digitalocean.com/c
ommunity/tutorials/a-comparison-of-nosql-database-management-systems-and-models).
DigitalOcean. Archived from the original (https://www.digitalocean.com/community/tutorials/a
-comparison-of-nosql-database-management-systems-and-models) on 13 August 2019.
Retrieved 23 August 2019. "Document-oriented databases, or document stores, are NoSQL
databases that store data in the form of documents. Document stores are a type of key-value
store: each document has a unique identifier — its key — and the document itself serves as
the value."
2. "DB-Engines Ranking per database model category" (http://db-engines.com/en/ranking_cat
egories).
3. "Description of the database normalization basics" (https://support.microsoft.com/en-ca/kb/2
83878). Microsoft.
4. Wambler, Scott. "The Object-Relational Impedance Mismatch" (http://www.agiledata.org/ess
ays/impedanceMismatch.html). Agile Data.
5. "Documentation | Aerospike - Key-Value Store" (https://docs.aerospike.com/docs/guide/kvs.
html). docs.aerospike.com. Retrieved 3 May 2021.
6. "Documentation | Aerospike" (https://docs.aerospike.com/docs/client/rest/index.html).
docs.aerospike.com. Retrieved 3 May 2021.
7. "HTTP Protocol for AllegroGraph" (https://franz.com/agraph/support/documentation/current/h
ttp-protocol.html).
8. "Multi-model highly available NoSQL database" (https://www.arangodb.com/). ArangoDB.
9. Documentation (http://www.couchbase.com/docs/) Archived (https://web.archive.org/web/20
120820182153/http://www.couchbase.com/docs/) 2012-08-20 at the Wayback Machine.
Couchbase. Retrieved on 2013-09-18.
10. "Apache CouchDB" (https://web.archive.org/web/20111020074113/http://couchdb.apache.or
g/docs/overview.html). Apache Couchdb. Archived from the original (http://couchdb.apache.o
rg/) on October 20, 2011.
11. "HTTP_Document_API - Couchdb Wiki" (https://web.archive.org/web/20130301093229/htt
p://wiki.apache.org/couchdb/HTTP_Document_API). Archived from the original (http://wiki.a
pache.org/couchdb/HTTP_Document_API) on 2013-03-01. Retrieved 2011-10-14.
12. "Crate SQL HTTP Endpoint (Archived copy)" (https://web.archive.org/web/2015062217452
6/https://crate.io/docs/stable/sql/rest.html). Archived from the original (https://crate.io/docs/sta
ble/sql/rest.html) on 2015-06-22. Retrieved 2015-06-22.
13. eXist-db Open Source Native XML Database (http://exist-db.org). Exist-db.org. Retrieved on
2013-09-18.
14. "Compare the Informix Version 12 editions" (http://www.ibm.com/developerworks/data/librar
y/techarticle/dm-0801doe/). 22 July 2016.
15. "MarkLogic Licensing" (https://web.archive.org/web/20120112032849/http://developer.markl
ogic.com/licensing). Archived from the original (http://developer.marklogic.com/licensing) on
2012-01-12. Retrieved 2011-12-28.
16. "MongoDB Licensing" (http://www.mongodb.org/about/licensing/).
17. "The New MongoDB Rust Driver" (https://www.mongodb.com/blog/post/the-new-mongodb-r
ust-driver). MongoDB. Retrieved 2018-02-01.
18. "Community Supported Drivers Reference" (http://docs.mongodb.org/ecosystem/drivers/com
munity-supported-drivers/).
19. "HTTP Interface — MongoDB Ecosystem" (https://docs.mongodb.com/ecosystem/tools/http-i
nterfaces/). MongoDB Docs.
20. "MongoDB Ecosystem Documentation" (https://github.com/mongodb/docs-ecosystem).
GitHub. June 27, 2019.
21. "GT.M High end TP database engine" (https://sourceforge.net/projects/fis-gtm/).
22. "RedisJSON - a JSON data type for Redis" (https://oss.redis.com/redisjson/).
23. "Transferring copyright to The Linux Foundation, relicensing RethinkDB under ASLv2" (http
s://github.com/rethinkdb/rethinkdb/commit/b0ec8bc5a874d5241d8af1166d664083edc5f750
#diff-97d9303acdfc078a050e61dc5c1a9a76). github.com. Retrieved 27 January 2020.
24. "solr/LICENSE.txt at main · apache/solr · GitHub" (https://github.com/apache/solr/blob/main/
LICENSE.txt). github.com. Retrieved 24 December 2022.
25. "Response Writers :: Apache Solr Reference Guide" (https://solr.apache.org/guide/solr/latest/
query-guide/response-writers.html). solr.apache.org. Retrieved 24 December 2022.
26. "Managed Resources :: Apache Solr Reference Guide" (https://solr.apache.org/guide/solr/lat
est/configuration-guide/managed-resources.html). solr.apache.org. Retrieved 24 December
2022.
27. "TerminusX - Why TerminusX" (https://terminusdb.com/why-terminus/). terminusdb.com.
Retrieved 2021-12-16.
Further reading
Assaf Arkin. (2007, September 20). Read Consistency: Dumb Databases, Smart Services. (h
ttps://web.archive.org/web/20080327222152/http://blog.labnotes.org/2007/09/20/read-consis
tency-dumb-databases-smart-services/)
External links
DB-Engines Ranking of Document Stores (http://db-engines.com/en/ranking/document+stor
e) by popularity, updated monthly