Couchbase
Couchbase Server is an open-source, distributed multi-model NoSQL document-oriented database software package that is optimized for interactive applications. It has a long history and evolution. It natively manipulates data in key-value form or in JSON documents. Nevertheless Couchbase may be used to store non-JSON data for various use cases.
Hackolade Studio was specially adapted to support the data modeling of multiple object types within one single bucket, while supporting multiple buckets as well. The application closely follows the terminology of the database.
The data model in the picture below results from the reverse-engineering of the sample travel application described here.
Buckets
There is a fundamental difference with many other NoSQL document databases: Couchbase strongly suggests to store documents of different kinds into the same "bucket". A bucket is equivalent to a database. Objects of different characteristics or attributes are stored in the same bucket. It may seem counter-intuitive when moving from a RDBMS or MongoDB, but records from multiple tables should be stored in a single bucket, with a “type” attribute to differentiate the various objects stored in the bucket.
Most deployments have a low number of buckets (usually 2 or 3) and only a few upwards of 5. Although there is no hard limit in the software, the maximum of 10 buckets comes from some known CPU and disk IO overhead of the persistence engine and the fact that Couchbase allocates a specific amount of memory to each bucket.
But having multiple buckets is something that can be quite useful for different use cases:
- multi-tenancy: you want to be sure all data are separated
- different types of data: you can for example store all documents (JSON) in one bucket, and use another one to store "binary" content. The setup would have a bucket with views, and the other one without any.
- for data with differing caching and RAM quota needs, compaction requirements, availability requirements and IO priorities, buckets act as the control boundary.
For example, if you choose to create 1 replica for medical-codes data that contain drug, symptom, and operation codes for a standard based electronic health record. This data can be recovered easily from other sources, so a single replica may be fine. However, patient data may require higher protection with 2 replicas. To achieve better protection for patient data without wasting additional space for medical-codes you could choose separate buckets for these 2 types of information.
There are 2 types of buckets, each with its properties: Couchbase buckets and Memcached buckets.
Documents
A document refers to an entry in the database (other databases may refer to the same concept as a row). A document has an ID (primary key in other databases), which is unique to the document and by which it can be located. The document also has a value which contains the actual application data.
Documents are stored as JSON on the server. Because JSON is a structured format, it can be subsequently searched and queried.
Document kind
When mixing different types of objects into the same bucket, it becomes necessary to specify a "type" attribute to differentiate the various objects stored in the bucket. In Hackolade Studio, each Document Kind is modeled as a separate entity or box, so its attributes can be defined separately. A specific attribute name must be identified to differentiate the different document kinds. The unique key and the document kind field are common to all document kinds in the bucket, and displayed at the top of each box in the ERD document:
Keys
Another modeling characteristic distinguishes Couchbase from some other NoSQL document databases: the unique key of each document is stored 'outside' the JSON document itself. Couchbase was originally a key-value store. With version 2.0, Couchbase bridged the gap to being a multi-model database supporting JSON documents. In essence, the key part remains, and the value part can also be a JSON document. The fundamental difference is that a pure key-value database doesn't understand what's stored in the value, while a document database understands the format in which documents are stored and can therefore provide richer functionality for developers, such as access to documents through queries.
Couchbase does not automatically generate IDs. Document IDs are assigned by the software application. A valid document ID must:
- Conform to UTF-8 encoding
- Be no longer than 250 bytes
Users are free to choose any ID for their document, so long as they conform to the above restrictions. This feature can be leveraged to define natural keys where possible, so they can be human-readable, deterministic, and semantic.
Flexible key design
Starting with v7.5.0, you may now define the structure of your primary key with a flexible design using existing constants, patterns, separators, and existing fields. This capability allows you to implement strategies illustrated on this page.
In a basic use case, you could have a simple document key made of single unique identifier. Unique identifiers could be natural keys, for example an email address. The advantage is that their contextual meaning is well understood by humans and as a result are predictable. But they are less flexible in the case for example when the value must change for the same document, for example if a user changes email address. Unique identifiers can also be surrogate keys, where a meaningless unique identifier like a UUID is used as the key. It may be unpredictable and harder to read or type, but it is more scalable and flexible.
With this flexible key design, it is now possible to structure a composite key (a.k.a compound key) and assemble multiple segments. For example, given this document model:
and the defined PK structure:
an example of key could be this:
You may define PK Structure using 4 types of segments: constant, field, separator, and regex pattern. There is no theoretical limit to the number of segments, but the total length of the key cannot exceed 250 bytes. If you use a numeric field for a segment, it should ideally be an integrer, and ti will be converted to a string (as the key can only be of string data type.
Typical separators for Couchbase are column (:), double column (::) or hash (#). Separators are not mandatory, but they are strongly recommended, at least between fields, as they facilitate parsing, and inference during reverse-engineering.
Attributes data types
Couchbase attributes support standard JSON data types, including lists and sets (arrays), and maps (objects). The Hackolade menu items, contextual menus, toolbar icon tooltips, and documentation are adapted to Couchbase's terminology and feature set. The following words are reserved.
Hackolade was specially adapted to support the data types and attributes behavior of Couchbase.
Indexes
An index is a data-structure that provides quick and efficient means to query and access data, that would otherwise require scanning a lot more documents. Couchbase Server provides different types of indexes, as documented here. .