Documentation

gitHub

Confluent Schema Registry

The Confluent Schema Registry is a central repository with a RESTful interface for developers to define standard schemas and register applications to enable compatibility. Schema Registry is available as a software component of Confluent Platform or as a managed component of Confluent Cloud.  Confluent Schema Validation provides a direct interface between the Kafka broker and Schema Registry to validate and enforce schemas programmatically. Schema Validation can be configured at the Kafka topic level.

 

A key component of event streaming is to enable broad compatibility between applications connecting to Kafka. In a large organizations, trying to ensure data compatibility can be difficult and ultimately ineffective, even when using Schema Registry, because schemas are handled as “agreements” at the application level. Kafka cannot prevent unformatted data from being published. As a Kafka deployment scales and the number of connected systems and apps increases, so does the amount of risk and uncertainty regarding data quality.

 

Hackolade supports Avro schema and JSON Schema maintenance in Confluent Schema Registry.  Unlike many other Hackolade targets, this does not require a special plugin.  It is done respectively through the Avro plugin or the JSON native target.  Support for ProtoBuf format is not currently available, but foreseen in the near future.

 

In effect, Hackolade provides a graphical interface for the design and maintenance of Avro and JSON schemas stored in Confluent Schema Registry.

confluent Schema Registry workspace

 

For the details on how to define Avro schema definitions in Hackolade, you should consult this Avro schema page.

 

Terminology

Message

A data item that is made of a key (optional) and a value.  Messages are not handled or managed by Hackolade.

Topic

A collection of of messages, where ordering is maintained for messages with the same key.  Topics are not handled or managed by hackolade, but can be referred to in some circumstances.

Schema

A description of how data should be structured.

Subject

A named ordered history of schema versions.

 

Subject Name Strategy

Hackolade supports the 3 strategies offered by the Confluent Schema Registry: 

StrategyDescription
TopicNameStrategyDerives subject name from topic name (default)
RecordNameStrategyDerives subject name from the record name, and provides a way to group logically related events that may have different data structures under a subject.
TopicRecordNameStrategyDerives the subject name from topic and record name, as a way to group logically related events that may have different structures under a subject.

 

The TopicNameStrategy is the default strategy, and implicitly requires that all messages in the same topic conform to the same schema to enforce the subject-topic constraints.  

 

The other 2 strategies can be used when a single topic can have records that use multiple schemas.  This is useful when data represents a time-ordered sequence of events, with messages having different data structures.  They use respectively the record name, and the topic plus the record name, to determine the subject to be used for schema lookups.  

 

Before the availability of these 2 new strategies, it was common to use Avro unions, with the caveats that it was difficult to independently evolve event types within the Avro union, plus the fact the resulting Avro union could become unwieldy: "By using either RecordNameStrategy or TopicRecordNameStrategy, you retain subject-schema constraints, eliminate the need for an Avro union, and gain the ability to evolve types independently. However, you lose subject-topic constraints, as now there is no constraint on the event types that can be stored in the topic, which means the set of event types in the topic can grow unbounded."

 

You should look at this table comparing how strategies work, as part of this page.

 

Avro unions with schema references

Introduced with Confluent 5.5, a schema reference is comprised of a reference name, and a subject/version for that reference.  The Avro union is just a list of event types that could be sent to a topic.  And each event type can evolve independently, as they would have with either RecordNameStrategy or TopicRecordNameStrategy, except that you can now use TopicNameStrategy and retain subject-topic constraints, 

 

You will find more details in this blog.  Hackolade does not yet support Avro unions with schema references, but design is under way.

 

Forward-Engineering

Schemas for Avro or JSON defined in Hackolade can be added to subjects in the Schema Registry through an Apply button appearing at the bottom of the Avro Schema tab.

 

Confluent Schema Registry forward-engineering

 

By pressing the button "Apply to instance" the system will automatically create a new version of the schema, if different from the latest one.

 

Reverse-Engineering

The reverse-engineering function lets you connect to the Confluent Schema Registry, either on-prem or in the Cloud.  

 

For more information, make sure to consult the Confluent Schema Registry overview.