Avro Schema Editor & Design Tool

Avro Schema Design

Apache Avro is an open source project providing a data serialization framework and data exchange services often used in the context of Apache Kafka and Apache Hadoop, to facilitate the exchange of big data between applications. It is also used for efficient storage in Apache Hive or Oracle for NoSQL, or as a data source in Apache Spark or Apache NiFi.


It is a row-oriented object container storage format and is language-independent. Avro uses JSON to define the schema and data types, allowing for convenient schema evolution. The data storage is compact and efficient, with both the data itself and the data definition being stored in one message or file, meaning that a serialized item can be read without knowing the schema ahead of time.


An Avro container file consists of a header and one or multiple file storage blocks. The header contains the file metadata including the storage blocks schema definition. Avro follows its own standards for defining schemas, expressed in JSON.


Avro object container file

Avro schema provides future-proof robustness in streaming architectures like Kafka, when producers and an unknown number of consumers evolve on a different timeline. Avro schemas support evolution of the metadata and are self-documenting the data, making it future-proof and more robust.


Producers and consumers are further decoupled by defining constraints for the way schemas are allowed to evolve over time. These evolution rules can be published in a schema registry, as provided by Confluent, HortonWorks, or NiFi. With a schema registry, you may also increase memory and network efficiency by sending a schema reference ID in the registry instead of repeating the schema itself with each message.

Avro Schema Evolution

As applications evolve, it is typical that the corresponding schema requirements change, with additions, deletions, or modifications to schema structure and fields. An important aspect of data management is to maximize schema compatibility and ensure that consumers are able to seamlessly read old and new data.

Backward compatibility means that data produced with an old version of the schema can be read with a newer version of the schema. Forward compatibility means that data written with a new version of the schema can be read with an older version of the schema. Full compatibility means that schemas are both backward and forward compatible: old data can be read with a newer version of the schema, and new data can be read with an order version of the schema.

When managing Avro schema, you may want to keep the following guidelines and best practices in mind:

  • - you may safely add, change, or remove non-mandatory field attributes, such as “doc” or “order”;
  • - you may safely change the sorting order of field attributes;
  • - you may safely add or remove field aliases;
  • - you may safely add a field, as long as it has a default value;
  • - you may safely add or change a default value to an existing field;
  • - do not remove a required field, unless it had a default value previously, or you will lose backward compatibility;
  • - do not change a field from data type long to int, or you will lose decimal data;
  • - do not change a required field to optional, or you will lose forward compatibility;
  • - you may rename optional fields with default values, but do not rename required fields;
  • - do not change field data types, or you will lose both forward and backward compatibility.

Avro Schema Design Tool

Hackolade has pioneered the field of data modeling for NoSQL databases and REST APIs, introducing a graphical software to perform the schema design of hierarchical and graph structures.

Avro Schema Editor

Hackolade is an Avro schema viewer and an Avro schema editor that dynamically forward-engineers Avro schema as the user visually builds an Avro data model. It can also reverse-engineer existing Avro files and Avro schema files so a data modeler or information architect can enrich the model with descriptions, metadata, and constraints.

Retrieving the Avro schema, Hackolade persists the state of the data model, and generates HTML documentation of the Avro schema to serve as a platform for a productive dialog between analysts, designers, architects, and developers. The visual Avro schema design tool supports several use cases to help enterprises manage their data.

Components of an Avro schema model

An Avro schema can be viewed as a language-agnostic contract for systems to interoperate. There are four attributes for a given Avro schema:

  • Type: specifies the data type of the JSON record, whether its complex type or primitive value. At the top level of an Avro schema, it is mandatory to have a “record” type.
  • Name: the name of the Avro schema being defined
  • Namespace: a high-level logical indicator of the Avro schema
  • Fields: the individual data elements of the JSON object. Fields can be of primitive as well as complex type, which can be further made of simple and complex data types.
  • Data types include primitive types (string, integer, long, float, double, null, boolean, and bytes) and complex types (record, enumeration, array, map, union, and fixed). There is also the case of logical types, which is an Avro primitive or complex type with extra attributes to represent a derived type.

    Hierarchical view of nested objects

    Complex types are easily represented in nested structures. This can be supplemented with detailed descriptions and a log of team comments gathered as the model adapts over time for the schema evolution.

    Avro Hierarchical view of nested objects

    Entity Relationship Diagram (ERD)

    Hackolade lets users visualize an Avro schema for different but related records via an Entity-Relationship Diagram of the physical data model.

    Avro Entity Relationship Diagram (ERD)

    Polymorphism

    Polymorphism in the context of Avro is just a fancy word which means that a field can be defined with multiple data types.

    An example of polymorphism found in data could be a date defined in two ways: a long (which represents the number of milliseconds since January 1, 1970, 00:00:00 GMT) or a string (dd-mmm-yyyy format). And since Avro does not provide the possibility to mark a field as optional, the simplest way would be to allow the field to also be a null type.

    Avro Polymorphism

    This would be represented in the following manner in JSON for Avro schema:

    {
     "name": "date_of_creation",
     "type" : [
      "null",
      "string",
      {
       "type" : "long",
       "logicalType" : "timestamp-millis"
      }
      ],

    Things get more complex when the combination of data types includes a complex type. But thanks to JSON Schema notation, we can represent this structure of an optional address complex type object:

    Avro Polymorphism

    Outputs of a Visual Schema Editor for Avro

    In addition to the dynamic Avro schema creation which facilitates development, Hackolade provides a rich, human-readable HTML report, including diagrams, records, fields, relationships and all their metadata. Many additional features have been developed to help data modelers.

    Outputs of a Visual Schema Editor for Avro

Benefits of Avro Schema Design

A model-first approach advocates to design the contract between producers and consumers prior to writing any application code: it is effective, it helps consumers understand data structures quickly, and it reduces the time to integrate. A consistent design decreases the learning curve and promotes higher reuse and understanding of complex data-centric enterprises. Hackolade increases data agility by making data structures transparent and facilitating its evolution. The benefits of schema design for Avro are widespread and measurable.

Model-first schema design is a best practice to ensure that applications evolve, scale, and perform well. A good data model helps reduce development time, increase application quality, and lower execution risks across the enterprise.

Free trial

To experience the first visual Avro schema editor and try Hackolade free for 14 days, download the latest version of Hackolade and install it on your desktop. There's no risk, no obligation, and no credit card required! The software runs on Windows, Mac, and Linux, plus it supports several other leading NoSQL databases.