Documentation

No results

Confluent Schema Registry

The Confluent Schema Registry is a central repository with a RESTful interface for developers to define standard schemas and register applications to enable compatibility. Schema Registry is available as a software component of Confluent Platform or as a managed component of Confluent Cloud. Confluent Schema Validation provides a direct interface between the Kafka broker and Schema Registry to validate and enforce schemas programmatically. Schema Validation can be configured at the Kafka topic level.

A key component of event streaming is to enable broad compatibility between applications connecting to Kafka. In a large organizations, trying to ensure data compatibility can be difficult and ultimately ineffective, so schemas should be handled as “contracts” between producers and consumers. Kafka cannot prevent unformatted data from being published. As a Kafka deployment scales and the number of connected systems and apps increases, so does the amount of risk and uncertainty regarding data quality.

Hackolade supports Avro schema and JSON Schema maintenance in Confluent Schema Registry. Unlike many other Hackolade targets, this does not require a special plugin. It is done respectively through the Avro, Parquet or ProtoBuf plugins, or the JSON native target.

In effect, Hackolade provides a graphical interface for the design and maintenance of Avro and JSON schemas stored in Confluent Schema Registry.

confluent Schema Registry workspace

For the details on how to define Avro schema definitions in Hackolade, you should consult this Avro schema page.

Terminology

Message

A data item that is made of a key (optional) and a value (the Kafka data payload.) Messages are not handled or managed by Hackolade. A schema type suffix, key or value, may be specified in the subject name.

Topic

A collection of of messages, where ordering is maintained for messages with the same key. Topics are not handled or managed by Hackolade, but can be referred to in some of the subject name strategies (see below.)

Schema

A description of how data should be structured. Each schema in Confluent Schema Registry is saved (registered) under a subject.

Subject

A named ordered history of schema versions. A subject can be seen as a scope in which a schema can evolve and is versioned.

Subject Name Strategy

Hackolade supports the 3 strategies offered by the Confluent Schema Registry:

Strategy	Description	Format
TopicNameStrategy	Derives subject name from topic name (default)	<topic name>-<schema type suffix (optional)>
RecordNameStrategy	Derives subject name from the record name, and provides a way to group logically related events that may have different data structures under a subject.	<avro namespace>-<avro record name>-<schema type suffix (optional)>
TopicRecordNameStrategy	Derives the subject name from topic and record name, as a way to group logically related events that may have different structures under a subject.	<topic name>-<avro namespace>-<avro record name>-<schema type suffix (optional)>

The TopicNameStrategy is the default strategy, and implicitly requires that all messages in the same topic conform to the same schema to enforce the subject-topic constraints.

The other 2 strategies can be used when a single topic can have records that use multiple schemas. This is useful when data represents a time-ordered sequence of events, with messages having different data structures. They use respectively the record name, and the topic plus the record name, to determine the subject to be used for schema lookups.

Before the availability of these 2 new strategies, it was common to use Avro unions, with the caveats that it was difficult to independently evolve event types within the Avro union, plus the fact the resulting Avro union could become unwieldy: "By using either RecordNameStrategy or TopicRecordNameStrategy, you retain subject-schema constraints, eliminate the need for an Avro union, and gain the ability to evolve types independently. However, you lose subject-topic constraints, as now there is no constraint on the event types that can be stored in the topic, which means the set of event types in the topic can grow unbounded."

You should look at this table comparing how strategies work, as part of this page.

Avro unions with schema references

Confluent Platform (versions 5.5.0 and later) provides full support for the notion of schema references, the ability of a schema to refer to other schemas

Introduced with Confluent 5.5, a schema reference is comprised of a reference name, and a subject/version for that reference. The Avro union is just a list of event types that could be sent to a topic. And each event type can evolve independently, as they would have with either RecordNameStrategy or TopicRecordNameStrategy, except that you can now use TopicNameStrategy and retain subject-topic constraints,

You will find more details in this blog and this documentation page. Hackolade supports Avro unions with schema references.

A schema reference consists of the following:

A name for the reference. (For Avro, the reference name is the fully qualified schema name, for JSON Schema it is a URL, and for Protobuf, it is the name of another Protobuf file.)
A subject, representing the subject under which the referenced schema is registered.
A version, representing the exact version of the schema under the registered subject.

In Hackolade Studio, you can create a reference to another Avro record in the same model through a model reference. Let's say you have 2 records in your Avro model: customer and address. Now you'd like to reference the address record from the customer record. Choose Append field > Reference > Model Definition > Open definitions...

Confluent Schema Rgistry - add reference

Then choose the address record in the dialog:

Confluent Schema Rgistry - pick definition

This will result in a reference inside the customer record:

Confluent Schema Rgistry - union schema

This will result in a references section in the customer schema for the Confluent Schema Registry:

Confluent Schema Rgistry - union schema outpu

Forward-Engineering

Schemas for Avro or JSON defined in Hackolade can be added to subjects in the Schema Registry through an Apply button appearing at the bottom of the Avro Schema tab.

The script can also be exported to the file system via the menu Tools > Forward-Engineering, or via the Command-Line Interface.

Confluent Schema Registry forward-engineering

By pressing the button "Apply to instance" the system will automatically create a new version of the schema, if different from the latest one.

Reverse-Engineering

The reverse-engineering function lets you connect to the Confluent Schema Registry, either on-prem or in the Cloud.

For more information, make sure to consult the Confluent Schema Registry overview.

Configure CORS to access Confluent Schema Registries from Hackolade Studio in the browser

When interacting (forward- or reverse-engineering) with Confluent Schema Registries (CSR) from https://studio.hackolade.com, a specific Cross Origin Resource Sharing (CORS) configuration must be set up on the CSR instance(s).

The CORS configuration Allow-Access-Origin header must allow for the '''studio.hackolade.com''' origin to interact with the instance, otherwise the browser blocks any interaction of Hackolade Studio in the browser with the schema registry.

Confluent Cloud

The Confluent Cloud platform doesn't allows custom configuration of the CORS Access-Control-Allow-Origin header when accessing the schema registry on their "confluent.cloud" domain.

To make it accept transactions from https://studio.hackolade.com, you (or your IT department) must first set up a CORS Reverse Proxy that serves the Confluent-provided schema registry while overriding response headers when the response is sent back to studio.hackolade.com.

We give you an example of how to deploy such a reverse proxy and use this endpoint as the host URL to execute forward- or reverse-engineering operations from https://studio.hackolade.com.

For example with an Nginx reverse proxy served on https://cors-proxy-for-confluent-cloud.mydomain.com, after you picked up the public URL for your Confluent Schema registry (something like https://<schema registry instance>.confluent.cloud):

server {
listen 443;
server_name confluent-proxy;
location / {
add_header Access-Control-Allow-Origin 'https://studio.hackolade.com' always;
add_header Access-Control-Allow-Headers '*';
add_header Access-Control-Allow-Methods '*';
add_header Access-Control-Allow-Credentials 'true';
if ($request_method = 'OPTIONS') {
return 204;
}
proxy_pass https://<schema registry instance>.confluent.cloud;
}
}

On premises custom instance

Please refer to the Confluent Schema Registry configuration reference to properly configure your custom instance. You need to add (at least) the property "access.control.allow.origin" to your "schema-registry.properties" file. It is often usesul too to set "access.control.allow.methods" to "*".

access.control.allow.origin=https://studio.hackolade.com\ access.control.allow.methods=*

On this page