Polyglot Data Modeling

polyglot | \ ˈpä-lē-ˌglät \ : speaking or writing several languages : multilingual

Hackolade Studio was originally designed and positioned as a physical-only data modeling tool. As coverage grew for more and more databases and communication protocols, users started to ask for the ability to define structures once, and be able to represent them in a variety of schema syntaxes. Hackolade already converted DDLs into all kinds of different schema syntaxes. It can also exported data models for any target technology into JSON Schema, or generate a Swagger/OpenAPI specification for any Hackolade data model.

Why not go a step further then, and allow the creation of a technology-agnostic data model?

Logical or Polyglot Data Model?

Many people may say: "Technology-agnostic model? Isn't that the definition of a logical data model?". Maybe, but with technologies of the 21st century and Agile development, the strict definition for logical data modeling is too constraining.

Today's big data not only allows, but promotes denormalization and the use of complex data types, which are not exactly compatible with the strict definition of logical modeling. Plus, physical schema designs are application-specific and query-driven, based on access patterns.

Let's consider a traditional description of the different levels of data modeling:

FeaturesConceptualLogicalPhysical
Entity namesXX
Entity relationshipsXX
AttributesXX
Primary keysXX
Foreign keysXX
Table namesX
Column namesX
Column data typesX

While it is fairly straightforward to go from Logical to Physical in a relational world because the databases implement different dialects of the same SQL specification, it is not the case with NoSQL or analytical big data.

A Polyglot Data Model for your Polyglot Data

There is a need for a data model which allows complex data types and denormalization, yet can be easily translated into vastly different syntaxes on the physical side. We call it a "Polyglot Data Model", a term inspired by the brilliant Polyglot Persistence approach promoted by Pramod Sadalage and Martin Fowler in their 2013 book NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence.

In our experience with customers, we observe two different types of polyglot persistence. The first type is the one originally described by Martin Fowler: best-of-breed persistence technology applied to the different use cases within a single application:

Polyglot persistence in a single application

A second type of polyglot persistence is even more pervasive: data pipelines from operational data stores, though object storage and multi-stage data lakes, streamed or served via APIs to self-service analytics data warehouses, ML and AI.

Polyglot data pipeline

In either case, customers are expecting from a modern data modeling tool that it helps design and manage schemas across the entire data landscape.

Polyglot data model in data-centric enterprise landscape

A Polyglot Data Model sits over the previous boundary between logical and physical. Some users continue to call it a logical model, but we think that it is a "logical model on steroids" with the following features:

  • allows denormalization, if desired, given access patterns;
  • allows complex data types;
  • generates schemas for a variety of technologies, with automatic mapping to the specific data types of the respective target technologies.

In RDBMS the different dialects of SQL will lead to fairly similar DDLs, whereas schema syntax for Avro, Parquet, OpenAPI, HiveQL, Neo4j Cypher, MongoDB, etc... are vastly different.

The generation of physical names for entities (could be called tables, collections, nodes, vertices, files, etc.. in target technology) and attributes (could be called columns, fields, etc.. in target technology) should be capable of following different transformation rules, as Cassandra, for example, does not allow UPPERCASE while MongoDB will prefer camelCase, etc.

Our Polyglot Data Model also allows conceptual data modeling capabilities. We do this leveraging the principles of Domain-Driven Design, including aggregates to store together what belongs together, and a graph view to represent concepts.

Data Model vs Schema Design

A data model is an abstraction describing and documenting the information system of an enterprise. Data models provide value in understanding, communication, collaboration, and governance, ... They help document the context and meaning of the data.

But the value of data models at a technical level is in the artifacts they help create: schemas. A schema is a “consumable” collection of objects describing the layout or structure of a file, a transaction, or a database. A schema is a scope contract between producers and consumers of data, and an authoritative source of structure and meaning of the context.

The authors of the Agile Manifesto wanted to restore a balance, and said: "We embrace modeling, but not in order to file some diagram in a dusty corporate repository." At Hackolade, we think that (data) modeling is indispensable, so schemas can be generated, consumed, and managed.

Data Model Schema


The Polyglot Data Model concept was built so you could create a library of canonical objects for your domains, and use them consistently across physical data models for different target technologies.

Learn more