Reverse engineering is a useful feature to facilitate the modeling and documentation of existing instances.  After the reverse engineering has been performed, the user can enrich the model created, by filling descriptions, recording constraints that were not obvious from the stored data, documenting relationships, etc…


Hackolade supports the reverse engineer of different types of data sources:

  • a JSON document: conversion of a basic JSON document into a Hackolade collection schema
  • a JSON Schema: conversion of a JSON Schema file into a Hackolade collection schema
  • a Data Definition Language file (DDL): from Oracle, Microsoft SQL Server, MySQL, PostgreSQL, Hadoop Hive
  • an XSD schema file from another ER tool, such as erwin, ER/Studio, PowerDesigner, or other
  • an Excel file template: for MongoDB only
  • a Mongoose schema: conversion of the JavaScript file containing a Mongoose schema, and transformation into a Hackolade collection schema
  • a MongoDB database: Hackolade accesses a MongoDB instance through a connection, samples the selected collection(s), and derives a Hackolade schema from the sampled data
  • a DynamoDB instance: Hackolade accesses DynamoDB either local or hosted at AWS.  It performs a sampling of the selected tables, and probalistically infers the Hackolade schema from the sampled data.
  • a Couchbase instance: Hackolade accesses a Couchbase instance through a connection, samples the selected bucket(s), and probalistically infers the Hackolade schema from the sampled data.
  • a Cosmos DB instance: Hackolade can reverse-engineer a local instance of Cosmos DB or hosted at Azure, using either the SQL API (previously known as DocumentDB API) or the MongoDB API.  The process performs a sampling of the selected collections and document types, and probalistically infers the Hackolade schema from the sampled data.
  • an Elasticsearch instance: Hackolade accesses an Elasticsearch instance through the REST API, samples the selected indices and types, and probalistically infers the Hackolade schema from the sampled data plus the retrieved mappings.
  • an HBase instance: Hackolade accesses an HBase instance through the REST API, reads the table and column family setup, then samples column qualifiers with JSON types to probabilistically infer the Hackolade schema from the sampled JSON data.
  • a Neo4j graph instance: Hackolade accesses a Neo4j instance through the REST API, and reads the Cypher setup for node labels and relationship types.
  • a Cassandra or DataStax (DSE) instance: Hackolade accesses a Cassandra instance through the REST API, reads the table and column setup, then samples columns with JSON types to probabilistically infer the Hackolade schema from the sampled JSON data.


This feature is also available via the Command-Line Interface.

Reverse-engineering of a MongoDB instance

This process includes a series of steps outlined below.  After going through the steps, the user is free to enrich the model for documentation purposes, and to generate forward-engineering scripts.

1. Choose the reverse-engineering function


2. Establishing a connection


TIP: please note, if you have instances with a very large number of databases, you may wish to declare a particular database directly in the connection string by appending the address with: /<name of the database you wish to access directly>, for example: localhost/NBA



For more information on the various authentication and encryption protocols, please consult these pages.


3. Selecting one or more collections


4. Sampling of one or more collections

In MongoDB 3.2, collections are sampled with the $sample operator via the /core/aggregation-pipeline. This provides efficient random sampling without replacement over the entire collection, or over the subset of documents specified by a query.


In MongoDB 3.0 and 2.6, collections are sampled via a backwards-compatible algorithm. It takes place in three stages:

- Hackolade opens a cursor on the desired collection, limited to at most 5000 documents sorted in descending order of the _id field.

- sampleSize documents are randomly selected from the stream, using reservoir sampling.

- Hackolade performs a query to select the chosen documents directly via _id.


Currently, sampleSize is set to 1000, 2500 or 5000 documents, via a user parameters in Tools > Options.


Note that this sampling mechanism is originally provided in Open Source by MongoDB, adapted to the specific needs of Hackolade.


5. Probabilistic inference of schema

From the sample set gathered above, measurable metrics can be applied to the schema, including schema depth and width, and class interactions, as described here:


Note that this inference mechanism is originally provided in Open Source by MongoDB, adapted to the specific needs of Hackolade.


In this step, when the instance being reverse-engineered is v3.2 and above, stored validator setup is leveraged as well, using the db.getCollectionInfos() method



6. Transformation into a Hackolade collection schema

In this final step, the derived schema needs to be converted and persisted into the Hackolade notation, so it can be visualized, adapted and enriched in the application.



Reverse-engineering of a DynamoDB instance

The process for DynamoDB is similar to the one for MongoDB.  Except that the connection protocol and the data query are of course adapted to the technology.


Reverse-engineering of a Couchbase instance

The process for Couchbase is similar to the one for MongoDB.  Except that the connection protocol and the data query are of course adapted to the technology.


Reverse-engineering of a Cosmos DB instance

The process for Cosmos DB is similar to the one for MongoDB.  Except that the connection protocol and the data query are of course adapted to the technology.