Apache Avro is a language-neutral data serialization system, developed by Doug Cutting, the father of Hadoop.  Avro is a preferred tool to serialize data in Hadoop.  It is also the best choice as file format for data streaming with Kafka.  Avro serializes the data which has a built-in schema. Avro serializes the data into a compact binary format, which can be deserialized by any application.  Avro schemas defined in JSON, facilitate implementation in the languages that already have JSON libraries.  Avro creates a self-describing file named Avro Data File, in which it stores data along with its schema in the metadata section.


Hackolade is a visual editor for Avro schema for non-programmers. To perform data modeling for Avro schema with Hackolade, you must first download the Avro plugin.  


Hackolade was specially adapted to support the schema design of Avro schema. The application closely follows the Avro terminology.



Avro Schema

An  Avro schema is created in JSON format and contains 4 attributes: namenamespacetype, and fields.  

Data Types

There are 8 primitive types (nullbooleanint, longfloatdoublebytes, and string) and 6 complex types (record, enumarray, map, union, and fixed).


Note: the Union type is supported through the use of model definitions.


Hackolade also supports Avro logical types.


As fields are always required in Avro, it is necessary to facilitate forward- and backward-compatibility by allowing fields to have a null type in addition to their natural data type.  To make this easy in Hackolade, it is advised to use the property:

leading to the following Avro schema:

{

                       "name": "Name",

                       "type": [

                           "null",

                           "string"

                       ],

                       "default": null

                   }


A + sign in the ERD helps visualize this condition:

and a beige-colored box in the tree vie


This becomes even handier when the data type of the field is complex.  The full logic for proper handling of the properties "null allowed", "required", and "default" is programmed as follows:


Forward-Engineering

Hackolade dynamically generates Avro schema for the structure created with the application.




Reverse-Engineering

Hackolade easily imports the schema from .avsc or .avro files to represent the corresponding Entity Relationship Diagram and schema structure.