Amazon DocumentDB (with MongoDB compatibility) is a fully managed document database service that supports MongoDB workloads. As a document database,  DocumentDB makes it easy to store, query, and index JSON data. Developers can use the same MongoDB application code, drivers, and tools as they do with MongoDB to run, manage, and scale workloads on Amazon DocumentDB.


Its architecture separates storage and compute so that each layer can scale independently, though the system is limited to a single writable master. Amazon DocumentDB uses the Aurora Storage Engine, originally built for the MySQL relational database.  The storage engine is distributed, fault-tolerant, self-healing, and durable, which it maintains by replicating data six ways across three AWS Availability Zones (AZs). 


“MongoDB compatible” means that DocumentDB interacts with the open source MongoDB 3.6 and 4.0 APIs. users can use the same MongoDB drivers, applications, and tools like Hackolade with DocumentDB with little or no changes. While DocumentDB supports a vast majority of the MongoDB APIs that customers actually use, it does not support every MongoDB API. The supported APIs, operations and data types are documented here.  This other page documents the functional differences between DocumentDb and MongoDB.


To perform data modeling for DocumentDB with Hackolade, you must first download the DocumentDB plugin.  


Hackolade was specially adapted to support the data modeling of DocumentDB, including databases, collections and indexes.The application closely follows the terminology of the database.


The data model in the picture below results from the reverse-engineering of the Yelp Challenge Dataset.





Each document stored in a collection requires a unique _id field that acts as a primary key. If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId for the _id field.  ObjectIds are small, likely unique, fast to generate, and ordered. ObjectId values consists of 12-bytes, where the first four bytes are a timestamp that reflect the ObjectId’s creation, specifically:

  • a 4-byte value representing the seconds since the Unix epoch,
  • a 3-byte machine identifier,
  • a 2-byte process id, and
  • a 3-byte counter, starting with a random value.


Data types

MongoDB represents JSON documents in binary-encoded format called BSON behind the scenes. BSON extends the JSON model to provide additional data types, ordered fields, and to be efficient for encoding and decoding within different languages.  The MongoDB BSON implementation is lightweight, fast and highly traversable. Like JSON, MongoDB's BSON implementation supports embedding objects and arrays within other objects and arrays.  DocumentDB does not support all of the BSON data tyoes.  See details here.



MongoDB provides a number of different index types to support specific types of data and queries.  DcoumentDB does not support all the indexes and index properties of MongoDB.  More details here.

  • default _id index:  creates a unique index on the _id field during the creation of a collection. The _id index prevents clients from inserting two documents with the same value for the _id field. You cannot drop this index on the _id field.
  • single field: supports the creation of user-defined ascending/descending indexes on a single field of a document.
  • compound index: also supports user-defined indexes on multiple fields, i.e. compound indexes.
  • multikey index:  uses multikey indexes to index the content stored in arrays. If you index a field that holds an array value, DocumentDB creates separate index entries for every element of the array. These multikey indexes allow queries to select documents that contain arrays by matching on element or elements of the arrays. It automatically determines whether to create a multikey index if the indexed field contains an array value; you do not need to explicitly specify the multikey type.
  • geospatial index: to support efficient queries of geospatial coordinate data, DocumentDB provides 2dsphere indexes that use spherical geometry to return results.



MongoDB can have the following properties:

  • unique indexes: the unique property for an index causes DocumentDB to reject duplicate values for the indexed field. Other than the unique constraint, unique indexes are functionally interchangeable with other indexes.
  • sparse indexes: the sparse property of an index ensures that the index only contain entries for documents that have the indexed field. The index skips documents that do not have the indexed field.  The sparse index option can be combined with the unique index option to reject documents that have duplicate values for a field but ignore documents that do not have the indexed key.
  • TTL indexes: time-to-live indexes are special indexes that DocumentDB can use to automatically remove documents from a collection after a certain amount of time. This is ideal for certain types of information like machine generated event data, logs, and session information that only need to persist in a database for a finite amount of time.



DocumentDB does not currently support MongoDB read-only views.



MongoDB sharding is not necessary, given the distributed storage architecture of DocumentDB.



DocumentDB does not provide an abstraction for schemas. In order to provide added-value in forward-engineering, Hackolade provides a JSON document by example for each collection in the model.  The script also includes the creation of collection and indexes, and it can be applied to the DocumentDB instance. 





A button lets the user apply to a selected instance the script to create databases, collections with indexes as well as sample data if desired.



The connection to the DocumentDB cluster instance is established using connection parameters further described in this page


is established using a connection string including (IP) address and port (typically 27017), and authentication using username/password if applicable. X.409 SSL encryption can be specified, and SSH tunneling to a Cloud instance is supported as well. Hackolade also supports MongoDB Enterprise security features, with LDAP and Kerberos authentication.


The Hackolade process for reverse-engineering uses $sample syntax to perform the statistical sampling followed by a schema inference step.  You may define a custom sampling with a specific aggregation pipeline query and sort.  You may also enable inference of implicit relationship in the data.  You may also enable inference of implicit relationship in the data.   


For more information on DocumentDB in general, please consult the website.and documentation.