CouchBaseServer BaoCao
CouchBaseServer BaoCao
1. Introduction.............................................................................................................4
1.1. Overview.....................................................................................................................4
2.2.1. Keys.................................................................................................................................7
2.2.2. Values..............................................................................................................................7
2.2.4. MetaData........................................................................................................................8
1.4. Bucket.......................................................................................................................12
2.2.7. Vbuckets.......................................................................................................................13
References......................................................................................................................50
IS211.O12.HTCL
1. Introduction
1.1. Overview
Distributed Database Management System: CouchBase Server
- CouchBase Server is an open source, NoSQL, JSON document-oriented,
distributed data platform. Data is stored as items, each having a key and a value.
- Sub-millisecond data operations are provided by powerful services for querying
and indexing, and by a feature-rich, document-oriented query-language, SQL++
- Multiple instance can be combined intro a single cluster. A Cluster Manager
program coordinates all node-activites.
- Data can be retained either in memory only, or in both memory and storage. Data
can also be replicated across the nodes of the cluster to ensure that node-loss does
not entail data-loss. Data items can also be selectively replicated across data
center; for the purpose of either of backup only, or of stimulat-neous, multi-geo
application-accress.
1.2. Authors, management organization and history
- Developers: CouchBase Inc
- Written in: C++, Erlang, C, Go, Java
- Intial release: August 2010
- Stable release: 7.2.2 / September 14,2023
- Membase was developed by several leaders of the memcached project, who had
founded a company, NorthScale, to develop a key-value store with the simplicity,
speed, and scalability of memcached, but also the storage, persistence and
querying capabilities of a database. The original membase source code was
contributed by NorthScale, and project co-sponsors Zynga and Naver Corporation
(then known as NHN) to a new project on membase.org in June 2010.
Table 1-1
2015 Couchbase Server 4.0 adds SQL-like queries with N1QL and Multi-
Dimensional Scaling
2017 Couchbase Server 5.0 introduces Full-Text Search and ephemeral buckets
2018 Couchbase Server 6.0 eliminates ETL for operational workloads with the
introduction of analytics based on SQL++; Couchbase Autonomous
Operator for Kubernetes debuts in its 1.0 release
2021 Couchbase Server 7.0 fuses the flexibility and scalability of NoSQL with
the trusted strengths of relational database
Picture 1-1
1.3.1. Keys
Each value is identified by a unique key(id), defined by the user or application when the
items is saved. The key is immutable: once the items is saves, the key cannot be
changed.
1.3.2. Values
The maximum size of a value is 20 MiB. A value can be either:
- Binary: Any form of binary. Note that a binary value cannot be parsed, indexed,
or queries: it can only be retrieved by key
- JSON: A JSON value, reffered to as a document, can be parsed, indexed and
queries. Each document consist of one or more atrribute, each of which has its
own value. An attribute’s value can be a basic type, such as a number, string, or
Boolean; or a complex, such as an embedded document or an array.
1.3.3. Document Structure
IS211.O12.HTCL
A sample JSON document is provided immediately below. Its contents are as follows:
a1 and a2 are attributes that respectively have a single number and a single string
as their values.
a3 is an attribute whose value is an embedded document consisting of a single
attribute, b1, whose own value is an array of three numbers.
a4 is an attribute whose value is an array of two documents, each document
consisting an attribute (c1 or c2) whose own values are respectively a string and a
number.
Picture 1-2
1.3.4. MetaData
Metadata is automatically generated and stored for each item saved in Couchbase
Server. For example, the following document, which contains airport-information, has
been saved with the key airport_1306:
IS211.O12.HTCL
Picture 1-3
Picture 1-4
- The CouchBase data model is based on JSON, which provides a simple, lightweight,
human-readable notation.
- An individual document often represents a single instance of an object in application
code. A document might be considered equivalent to a row in a relational table; with
each of the document’s attributes being equivalent to a column. Couchbase,
however, provides greater flexibility than relational databases, in that it can
store JSON documents with varied schemas.
- Documents can contain nested structures. This allows developers to express many-
to-many relationships without requiring a reference or junction table; and is naturally
expressive of hierarchical data.
Example: The nature and value of the document data model is clarified by comparison
with the relational. To support an online flight-booking application, allowing users to
search for flights by date, the relational model requires multiple tables — for flights,
airlines, and schedules. The result may be as follows:
IS211.O12.HTCL
Picture 1-5
By contrast, the document model likely requires only a single document, which embeds
an array of schedules for all flights between each of two airports:
Picture 1-6
Thus, in the document model, each document can be highly self-contained. This
supports the rapid fulfillment of application-requests, and has important implications for
both scalability and latency: one document can be replicated, or atomically changed,
IS211.O12.HTCL
without other documents needing to be accessed; eradicating the need for complex inter-
node coordination, and minimizing contention
1.4. Services
- Couchbase Services support access to and maintenance of data. Services can be
deployed with flexibility across available hardware-resources, providing Multi-
Dimensional Scaling, whereby a cluster can be tuned for optimal handling of
emergent workloads.
- Services are configured and deployed by the Full Administrator who initializes
Couchbase Server on one or more nodes. The standard configuration-sequence
allows a subset of services to be selected per node, with an individual memory-
allocation for each. Each service supports a particular form of data-access. Services
not required need not be deployed. Services intended to support a heavy workload
can be deployed across multiple cluster-nodes, to ensure optimal performance and
resource-availability.
Picture 1-7
IS211.O12.HTCL
In this revised configuration, the Data Service is the only service to run on two of the
nodes; the Index Service the only service on two futher nodes; and the Query and
Search Services share the fifth and final node.
Couchbase Server provides the following services:
Data: Supports the storing, setting, and retrieving of data-items, specified by key.
Query: Parses queries specified in the N1QL query-language, executes the
queries, and returns results. The Query Service interacts with both the Data and
Index services.
Index: Creates indexes, for use by the Query and Analytics services.
Search: Create indexes specially purposed for Full Text Search. This supports
language-aware searching; allowing users to search for, say, the word beauties,
and additionally obtain results for beauty and beautiful.
Analytics: Supports join, set, aggregation, and grouping operations; which are
expected to be large, long-running, and highly consumptive of memory and CPU
resources.
Eventing: Supports near real-time handling of changes to data: code can be
executed both in response to document-mutations, and as scheduled by timers.
Backup: Supports the scheduling of full and incremental data backups, either for
specific individual buckets, or for all buckets on the cluster. Also allows the
scheduling of merges of previously made backups.
- These services can be deployed, maintained, and provisioned independently of one
another, by means of Multi-Dimensional Scaling, to ensure the most effective
ongoing response to changing business conditions and emergent workload-
requirements
1.5. Bucket
A bucket is the fundametal space for storing data in CouchBase Server. Each bucket
contain a hierachy of scope and collection to group keys and values logically.
1.5.1. Bucket Types
IS211.O12.HTCL
If a Couchbase bucket’s RAM-quota is exceeded, items are ejected. This means that
data, which is resident both in memory and on disk, is removed from memory, but not
from disk. Therefore, if removed data is subsequently needed, it is reloaded into
memory from disk. For a Couchbase bucket, ejection can be either of the following,
based on configuration performed at the time of bucket-creation:
Resident data-items remain in RAM. No additional data can be added; and attempts
to add data therefore fail.
IS211.O12.HTCL
Resident data-items are ejected from RAM, to make way for new data. For an
Ephemeral bucket, this means that data, which is resident in memory (but, due to
this type of bucket, can never be on disk), is removed from memory. Therefore, if
removed data is subsequently needed, it cannot be re-acquired from Couchbase
Server.
1.5.2. Vbuckets
vBuckets are virtual buckets that help distribute data effectively across a cluster, support
replication across multiple nodes.
- Both CouchBase and Ephermal buckets are implemented as Vbuckets, up to 1024 of
which are created for every bucket.
- Buckets’s items are distributed evenly across its vBukets
- The 1024 vBuckets that implement a defined bucket are referred to as active
vBuckets. If a bucket is replicated, each replica is implemented as a further 1024
vBuckets, referred to as replica vBuckets. Write operations are performed only on
active vBuckets. Most read operations are performed on active vBuckets, though
items can also be read from replica vBuckets when necessary.
- Items are written to and retrieved from vBuckets by means of a CRC32 hashing
algorithm, which is applied to the item’s key, and so produces the number of the
vBucket in which the item resides. vBuckets are mapped to individual nodes by the
Cluster Manager: the mapping is constantly updated and made generally available to
SDK and other clients.
- The relationships between a bucket, its keys (and their associated values), the
hashing algorithm, vBuckets, server-mappings, and servers, is illustrated below:
IS211.O12.HTCL
Picture 1-8
Thus, an authorized client attempting to access data performs a hash operation on the
appropriate key, and thereby calculates the number of the vBucket that owns the key.
The client then examines the vBucket map to determine the server-node on which the
vBucket resides; and finally performs its operation directly on that server-node.
- In Couchbase Server Version 7.0+, documents within a bucket are organized into
Scopes and Collections. Scopes and collections do not affect the way in which keys
are allocated to vBuckets.
1.5.3. Scope and Collections
- Collection: A data container, within a bucket whose type is either CouchBase or
Ephemeral.
Item-names must be unique within their collections
Items can optionally be assigned to different collections according to content-
type
- Scope: a mechanism for grouping of multiple collections.
Collection-names must be unique within their scope
IS211.O12.HTCL
Picture 1-9
2.2.1. Indexes
- Indexes enhance the performance of querry and search operation
- Indexes are used by certain services, such as Query, Analytics, and Search, as targets
for search-routines. Each index makes a predefined subset of data available for the
search.
- The Query service relies on indexes provided by the Index service. The Search and
Analytics services both provide their own indexes, internally.
- The following forms of index are available:
Primary: Provided by the Index Service, this is based on the unique key of
every item in a specified collection. Every primary index is maintained
asynchronously. A primary index is intended to be used for simple queries,
which have no filters or predicates.
IS211.O12.HTCL
Picture 1-10
Picture 1-11
Expression:
Literal value
Identifiers
Arithmetic terms
Comparison terms
Concatenation terms
Logical terms
Conditional expressions
Collection expressions
Construction expressions
Nested expressions
Function calls
Subqueries
Comments: block comments and line comments
IS211.O12.HTCL
Picture 1-12
Couchbase Data is defined logically to reside in Buckets. Each bucket, when defined, is
implemented by Couchbase Server as vBuckets, up to 1024 of which thereby exist in
IS211.O12.HTCL
memory (and, in the case of Couchbase Buckets, on disk); the exact number at any
given time depending on the number of items to be stored. Items are associated with
vBuckets by means of a hashing algorithm, and buckets are assigned to nodes according
to a fixed mapping, determined and updated by the Master Services of the ns-server
component of the Cluster Manager.
Up to three replica buckets can be defined for every bucket. Each replica itself is also
implemented as 1024 vBuckets. A vBucket that is part of the original implementation of
the defined bucket referred to as an active vBucket. Therefore, a bucket defined with
two replicas has 1024 active vBuckets and 2048 replica vBuckets. Typically, only
active vBuckets are accessed for read and write operations: although replica vBuckets
are able to support read requests. Nevertheless, vBuckets receive a continuous stream of
mutations from the active vBucket by means of the Database Change Protocol (DCP),
and are thereby kept constantly up to date. To ensure maximum availability of data in
case of node-failures, the Master Services for the cluster calculate and implement the
optimal vBucket Distribution across available nodes: consequently, the chance of data-
loss through the failure of an individual node is minimized, since replicas are available
on the nodes that remain. This is shown by the following illustration:
Picture 1-13
IS211.O12.HTCL
The illustration shows active and replica vBuckets that correspond to a single, user-
defined bucket, for which a single replication-instance has been specified. The first nine
active vBuckets are shown, along with their nine corresponding, replica vBuckets;
distributed across three server-nodes. The distribution of vBuckets indicates a likely
distribution calculated by Couchbase Server: no replica resides on the same node as its
active equivalent: therefore, should any one of the three nodes fail, its data remains
available.
Picture 1-14
Picture 1-15
Note that XDCR provides only a single basic mechanism from which replications are
built: this is the unidirectional replication. A bidirectional topology is created by
implementing two unidirectional replications, in opposite directions, between two
clusters; such that a bucket on each cluster functions as both source and target.
Picture 1-16
Picture 2-17
Picture 2-18
Picture 2-19
IS211.O12.HTCL
Picture 2-20
IS211.O12.HTCL
Picture 2-21
Host Name/ IP Address: The name will be used for this node. You can use
loopback address 127.0.0.1, when a second node is added to the cluster, the name
will be changed to the IP of the underlying host. If you wish, you can substitute
the IP address of the underlying host now, or you can substitute the fully qualified
hostname of the underlying host
IS211.O12.HTCL
Picture 2-22
- From the Settings screen, select the Sample Buckets tab. The Sample Buckets
screen now appears, as follows:
Picture 2-23
Picture 2-24
2. Ensure that the node to be added has been started. This can be accomplished by
checking the IP address and port number for the new node in the address bar of the
browser. The following interface is displayed:
Picture 2-25
IS211.O12.HTCL
This indicates that Couchbase Server is installed and running on the new node, but has
not yet been provisioned. Do not use this interface: instead, return to Couchbase Web
Console for the cluster, 10.142.181.101.
3. Click ADD SERVER button
Picture 2-26
Note the warning provided at the top of the dialog: if the node to be added has already
been provisioned, the results of such provisioning will be eliminated and replaced on
the node’s addition to the current cluster. (In fact, the node to be added in this example,
has neither been initialized nor provisioned.)
IS211.O12.HTCL
Picture 2-27
Left-click on the Add Server button to save the settings. The Servers screen is
redisplayed, with the following appearance:
IS211.O12.HTCL
This indicates that the new node, 10.142.181.102 has been successfully added.
However, it is not yet taking traffic, and will be added following a rebalance. Note, at
this point, the figure under the Items column for for 10.142.181.101: this is 63.1 K/0,
which indicates that the node contains 63.1 K items in active vBuckets, and 0 items in
replica vBuckets. Meanwhile, the Items figure for 10.142.181.102 is 0/0, indicating that
no items are yet distributed onto that node in either active or replica form.
5. To perform a rebalance, left-click on the Rebalance button, at the upper right
The new node is rebalanced into the cluster, meaning that whatever active and replica
vBuckets were previously distributed across the original cluster nodes are redistributed
across the superset of nodes created by the addition. Additionally, a Rebalance dialog is
displayed:
IS211.O12.HTCL
Picture 2-28
The dialog indicates rebalance progress for each of the services on the cluster. To see
more information on the progress related to the Data Service, left-click on the travel-
sample tab, under Data: The pane expands to provide additional information on the
progress of data-transfer for the travel-sample bucket:
Picture 2-29
When the rebalance is complete, the panel at the bottom of the screen disappears, and
the dialog appears as follows:
IS211.O12.HTCL
Picture 2-30
Picture 2-31
Note that the figure in the Items column for node 10.142.181.101 is 31.5 K/31.6 K,
which indicates that 31.5 K items are stored on the node in active vBuckets, and 31.6 K
in replica vBuckets. The figure for 10.142.181.102 indicates the converse. Therefore,
replication has successfully distributed the contents of travel-sample across both nodes,
providing a single replica vBucket for each active vBucket.
IS211.O12.HTCL
Picture 3-32
- A add form bucket will pop up → Fill out the information → Click “OK”
IS211.O12.HTCL
Picture 3-33
IS211.O12.HTCL
Picture 3-34
Picture 3-35
Picture 3-36
- Choose Bucket and fill out the name of scope we want to creata (Ex: classScope)
Picture 3-37
IS211.O12.HTCL
Picture 3-38
- In the end of that Scope → Choose “Add Collection” → Fill out the name of
Collection
(EX: classInfo) → Save
Picture 3-39
IS211.O12.HTCL
Picture 3-40
- Choose Tab Buckets → Choose “ADD SERVER” in the top right corner → Enter
the IP of the node you want to connect (EX: 26.121.249.103 – node B) and
account have admin access
IS211.O12.HTCL
Picture 3-41
Picture 3-42
Picture 3-43
IS211.O12.HTCL
- Rebalance successfully
Picture 3-44
Picture 3-45
- After node B was added, refresh the page → Sign in UI will be displayed
Picture 3-46
IS211.O12.HTCL
Picture 3-47
Picture 3-48
IS211.O12.HTCL
Picture 3-49
Picture 3-50
IS211.O12.HTCL
Picture 3-51
Picture 3-52
IS211.O12.HTCL
Picture 3-53
- Choose file type and bucket.scope.collection → Then click “import data” button
in the bottom middle
Picture 3-54
IS211.O12.HTCL
IS211.O12.HTCL
- Back to tab Document to check the data → data was imported successfully
Picture 3-55
Picture 3-56
IS211.O12.HTCL
Picture 3-57
Picture 3-58
Picture 3-59
IS211.O12.HTCL
Picture 3-60
- Try again
Picture 3-61
- Result
IS211.O12.HTCL
Picture 3-62
IS211.O12.HTCL
Picture 3-63
References
Couchbase Server - Wikipedia
Couchbase Server | Couchbase Docs
Couchbase - Database of Databases (dbdb.io)