Polyglot Data Modeling

polyglot | \ ˈpä-lē-ˌglät \ : speaking or writing several languages : multilingual

Hackolade Studio was designed from the start and positioned as a physical-only data modeling tool. The reasoning was that there are plenty of excellent and mature ER tools that provide conceptual and logical modeling capabilities. And Hackolade's contribution has always been about helping solve challenges with no previously existing solution.

As coverage grew for more and more databases and communication protocols, however, users started to ask for the ability to define structures once, and be able to represent them in a variety of schema syntaxes. Hackolade converts DDLs into all kinds of different schema syntaxes. It can also export data models for any target technology into JSON Schema, or generate a Swagger/OpenAPI specification for any Hackolade data model.

Why not go a step further, and allow the creation of a technology-agnostic data model?

Logical or Polyglot Data Model?

Many people may say: "that's the definition of a logical data model, isn't it?". We think that, with technologies of the 21st century and Agile development, the strict definition for logical data modeling makes it a bit too constraining.

Today's Big Data not only allows, but promotes denormalization and the use of complex data types, which are not exactly compatible with the definition of logical modeling. Plus, physical schema designs are application-specific and query-driven, based on access patterns.

Let's consider a traditional description of the different levels of data modeling:

Entity namesXX
Entity relationshipsXX
Primary keysXX
Foreign keysXX
Table namesX
Column namesX
Column data typesX

While it maybe fairly straight-forward to go from Logical to Physical in a the relational world, it is not the case with NoSQL or analytical big data.

A Polyglot Data Model for your Polyglot Data

There is a need for a data model which allows complex data types and denormalization, yet can be easily translated into vastly different syntaxes on the physical side. We call it a "Polyglot Data Model", a term inspired by the brilliant Polyglot Persistence approach promoted by Martin Fowler and the folks at ThoughtWorks.

Polyglot data model in data-centric enterprise landscape

A Polyglot Data Model is a model that should:

  • use physical entity names (could be called tables, collections, nodes, vertices, files, etc.. in target technology);
  • use physical attribute names (could be called columns, fields, etc.. in target technology);
  • allow complex data types;
  • allow denormalization;
  • be able to generate schemas for a variety of technologies.

The last point is also very important. In RDBMS the different dialects of SQL will lead to fairly similar DDLs, whereas schema syntax for Avro, Parquet, OpenAPI, HiveQL, Neo4j Cypher, MongoDB, etc... are vastly different.

Data Model vs Schema Design

A data model is an abstraction describing and documenting the information system of an enterprise. Data models provide value in understanding, communication, collaboration, governance, ...

But the true value of data models is in the artifacts they help create: schemas. A schema is a “consumable” collection of objects describing the layout or structure of a file, a transaction, or a database. A schema is a scope contract between producers and consumers of data, and an authoritative source of structure and meaning of the context.

The authors of the Agile Manifesto wanted to restore a balance, and said: "We embrace modeling, but not in order to file some diagram in a dusty corporate repository." At Hackolade, we think (data) modeling is indispensable, so schemas can be generated, consumed, and managed.

Data Model Schema

The Polyglot Data Model concept is currently under construction and will be released soon. In the meantime, you may convert a model to another target by forward-engineering to JSON Schema from the source model, then going to the destination model and reverse-engineering the JSON Schema.