Storage Formats & IDLs
Apache Avro Schema
Serialize data in Hadoop and stream data with Kafka
Apache Avro is a language-neutral data serialization system, developed by Doug Cutting, the father of Hadoop. Avro is a preferred tool to serialize data in Hadoop. It is also the best choice as file format for data streaming with Kafka. Avro serializes the data which has a built-in schema. Avro serializes the data into a compact binary format, which can be deserialized by any application. Avro schemas defined in JSON, facilitate implementation in the languages that already have JSON libraries. Avro creates a self-describing file named Avro Data File, in which it stores data along with its schema in the metadata section.
Hackolade was specially adapted to support the data modeling of Avro schema NoSQL databases. It closely follows the Avro terminology, and dynamically generates Avro schema for the structure created with a few mouse clicks. Hackolade easily imports the schema from .avsc or .avro files to represent the corresponding Entity Relationship Diagram and schema structure.
JSON is increasingly dominating the application development world
Hackolade is a visual editor of JSON Schema draft v4. It supports all the advanced features, including choices and polymorphism. With Hackolade, it is easy to visually create a JSON Schema from scratch, and without prior knowledge of the syntax. You can also easily derive JSON Schema from JSON document files.
Apache Parquet Schema
Visual schema design to serialize data in columnar format
Apache Parquet is a binary file format that stores data in a columnar fashion for compressed, efficient columnar data representation in the Hadoop ecosystem, and in cloud-based analytics. Hackolade is a visual editor for Parquet schema for non-programmers, and specifically adapted to support the schema design of Parquet files. It supports the Parquet structure, data types, logical types, encodings, compression codecs, and all other standard metadata. Hackolade dynamically generates parquet schema as the model is created via the application. It also lets you perform reverse-engineering of files on the local system, shared directories, AWS S3 buckets, Azure Blob Storage, or Google Cloud Storage.
Use JSON Schema to validate YAML documents
Reliance upon syntactic whitespace can be frustrating, particularly in the context of infrastructure-as-code and Kubernetes deployments,
While YAML has advanced features that cannot be directly mapped to JSON, most YAML files use features that can be validated by JSON Schema. JSON Schema is the most portable and broadly supported choice for YAML validation. Hackolade now support the reverse-engineering of YAML files and generation of sample data in YAML, plus forward- and reverse-engineering of YAML Schema.