Avro schema

Apache Avro is a language-neutral data serialization system, developed by Doug Cutting, the father of Hadoop.  Avro is a preferred tool to serialize data in Hadoop.  It is also the best choice as file format for data streaming with Kafka.  Avro serializes the data which has a built-in schema. Avro serializes the data into a compact binary format, which can be deserialized by any application.  Avro schemas defined in JSON, facilitate implementation in the languages that already have JSON libraries.  Avro creates a self-describing file named Avro Data File, in which it stores data along with its schema in the metadata section.


Hackolade is a visual editor for Avro schema for non-programmers. To perform data modeling for Avro schema with Hackolade, you must first download the Avro plugin.  


Hackolade was specially adapted to support the schema design of Avro schema. The application closely follows the Avro terminology.




Avro Schema

An  Avro schema is created in JSON format and contains 4 attributes: namenamespacetype, and **fields. **

Data Types

There are 8 primitive types (nullbooleanint, longfloatdoublebytes, and string) and 6 complex types (record, enumarray, map, union, and fixed).




Hackolade also supports Avro logical types



Union types

As fields are always technically required in Avro, it is necessary to facilitate forward- and backward-compatibility by allowing fields to have a null type in addition to their natural data type.  In Hackolade, when you create a new field, it is created with the required property selected.  If you want to make a field logically optional, it must still be present physically, but with a default which must be null.  To do this in Hackolade, you would set the data type to null, then de-select the required property, and make the default property = null (without quotes):



Note: the position of null in the hierarchy has an influence on the default.  Default is based on the first data type listed.  For "default": null to appear, the null data type must be first in the multiple data types, and the word null (without quotes) entered in the default property..


But how you treat this in the application differs depending on whether the data type(s) is(are) scalar or complex:

Scalar types

Combining a null type with a scalar data type (booleanint, longfloatdoublebytes, and string) is very simple, you must click on the + sign to the right of the type property to become:


which results in multiple blocks of properties appearing below in the Properties Pane:



Complex types

If at least one data type is complex (record, enumarray, map, union, or fixed), then you must use a oneOf choice, for example:




Hackolade dynamically generates Avro schema for the structure created with the application.





This structure can be forward-engineered to a file with .avsc extention or copied/pasted to code.  It can also be forward-engineered to a Confluent Schema Registry instance.


Hackolade easily imports the schema from .avsc or .avro files to represent the corresponding Entity Relationship Diagram and schema structure.  You may also import and convert from JSON Schema and documents.