SQL DDL

Migrating from relational databases to NoSQL requires a mind shift, and Hackolade can help in the migration of the schema from a relational model to a model combining embedding and referencing, so you can leverage the power, performance, and flexibility of NoSQL.

For more information on NoSQL document schema design, you may want to read this 3-part blog. Then use Hackolade to facilitate the migration...

The table below summarizes denormalization strategies and best practices:

If...	Then...
Relationship is one-to-one	Merge tables into one collection
Relationship is one-to-few	Merge tables into parent collection with an array of embedded child documents
Relationship is one-to-many	Embed parent data as nested sub-document in child collection (consider bucketing)
Relationship is one-to-zillions	Store reference to parent in child document (consider child tables)
Relationship is many-to-one	Embed child data as nested sub-document in parent
Relationship is many-to-many	Store data as separate documents, with reference to the parent in the child, and vice-versa
Data reads are mostly parent fields	Store children as separate documents
Data writes are mostly parent fields or child fields (not both)	Store children as separate documents
Data writes are mostly parent fields and child fields (both)	Store children as nested objects

From the blog referenced above, here are some “rules of thumb” to guide you through these innumerable (but not infinite) choices:

think in terms of queries, updates, and data representation, rather than in terms of storage. Data access patterns (instead of 3NF) drives you data model
favor embedding unless there is a compelling reason not to
needing to access an object on its own is a compelling reason not to embed it
arrays should not grow without bound. High-cardinality arrays are a compelling reason not to embed.
don't be afraid of application-level joins (with proper indexing)
consider the write/read ratio when denormalizing. A field that will mostly be read and only seldom updated is a good candidate for denormalization.

Reverse-engineer a Data Definition Language file (DDL)

Data Definition Language (DDL) is a vocabulary used to define data structures in SQL databases. These statements are used to create, alter, or drop data structures in an SQL database instance.

If you generate a DDL from Oracle, Microsoft SQL Server, MariaDB, MySQL, PostgreSQL, Snowflake, Hadoop Hive, etc., you can then read it with Hackolade and generate a model of the schema.

Note: to speed up import, it is highly recommended to create a pure DDL, i.e. without any data, rather than a DML (Data Manipulation Language.)

Choose Tools > Reverse-Engineering > Data Definition Language file:

Reverse-Engineering -- DDL menu

Then choose the source database (as each vendor has it's own implementation of the DDL syntax). The structure of a DDL can be imported either as an entity in the Entity Relationship Diagram, or alternatively as a model definition so it could be re-used in the model:

Reverse-Engineering -- DDL select file

and select the file from your file system.

If you wish to force the destination of the reverse-engineering operation, you may specify the container in which the entities will be inserted.

Hackolade currently supports the following SQL dialects: Oracle, MariaDB, MySQL, SQL Server, DB2, PostgreSQL, Informix, Redshift, Snowflake, and Teradata.

When you click OK, Hackolade will read the DDL file and generate the schema model, for example:

Reverse-engineering -- Northwind Oracle

Suggest denormalization

You should save the resulting model as a base, for example v0. Then it is suggested you save it again as v1 and work with the denormalization function to make v1 evolve towards a denormalized model.

The user needs to have a fairly good idea of where he wants to go with the schema design, and use Hackolade to facilitate the work. For example, with the Northwind database shown above, in a NoSQL model, one would want to merge the tables 'orders' and 'order details' while embedding some data from the master tables around.

Suggest denormalization -- select tables

The successive steps could include:

1. Embed as sub-documents into 'orders': employees, customers, and shippers

2. Embed as sub-documents into 'products': suppliers and categories

3. Embed as sub-documents into 'order details': products

4. Embed as an array of sub-documents into 'order': order details

5. Delete 'order details'

Reverse-Engineering - denormalization

As a subsequent step, the modeler may want to denormalize products information into the order detail lines by embedding product sub-documents in the child, the deleting the unnecesary fields. This time, as a products master collection is justified as its own entity, it should not be deleted.

Reverse-Engineering - denomalization step 2

But denormalization, depending on how you want to access data to optimize performance, may also include 2-way referencing, or 2-way embedding.

On this page