Avro file or schema
Hackolade easily imports the schema from .avsc or .avro files, located on your local file system or on a shared directory, to represent the corresponding Entity Relationship Diagram and schema structure. When multiple files are selected, you have the choice to either combine the schemas of the selected files (default), or to create a separate schema in the model for each selected file.
Avro files and schemas can also be reverse-engineered from AWS S3, Azure Blob Storage/ADLS, and Google Cloud Storage.
Give a meaningful name to the connection to identify it for later, and provide proper URI to your S3 bucket, and optional folder path.
If the S3 bucket is private, you must also provide authentication parameters (Access key id and Secret access key):
If you wish to handle AWS authentication through the credentials file, you may leave blank the Access Key ID and Secret Access Key fields, knowing that Hackolade supplies credentials following the recommendations described here.
Azure Blob Storage and ADLS
Give a meaningful name to the connection to identify it for later, and provide proper Container name and Storage account name.
Note: be careful to only mention the storage account name, i.e. NOT a full URL or anything other than the storage account name.
If you wish to filter files, you may enter a file name prefix:
If the storage is public, you may choose the anonymous method:
If the storage account is private, Azure provides a choice of 3 methods with different levels of granularity depending on your security requirements:
Storage Access Key
The storage access key is similar to a root password for your storage account. Always be careful to protect your access keys. Microsoft recommends that you use Azure Key Vault to manage your access keys, and that you regularly rotate and regenerate your keys.
Select the authentication method and paste the key into the Storage Access Key field:
Shared Access Signature
A shared access signature (SAS) provides secure delegated access to resources in your storage account, with granular control over how a client can access your data: user delegation, service or account.
For Hackolade to be able to reverse-engineer, the minimum rights are as shown here:
After clicking the button to generate, copy the SAS token from the Azure portal, and paste it in the SAS Token field:
Shared Access Token per container
It is possible to generate tokens for specific containers in the "Shared access tokens" menu option of the container:
The minimum required rights for our reverse-engineering process to succeed are: read and list.
After clicking the button to generate, copy the Blob SAS token from the Azure portal, and paste it in the SAS Token field:
Google Cloud Storage
Give a meaningful name to the connection to identify it for later, and provide proper URI to your GCS bucket, and optional folder path.
If the bucket is private, you must also access to the Private key:
Confluent Schema Registry on Confluence Cloud
To connect to your schema registry instance in the cloud you first must obtain both an API key and API secret for it. They are found in the Schema Registry tab, in the API endpoint section:
Give a meaningful name to the connection to identify it for later, choose Cloud as a source, and provide the URL to your Schema Registry:
Then provide the API key and API secret:
Confluence Schema Registry on-premises
Give a meaningful name to the connection to identify it for later, choose on-premise as a source, and provide the URL to your Schema Registry:
Then provide your username and password:
Azure Schema Registry for Event Hubs
Give a meaningful name to the connection to identify it for later, choose Azure Schema Registry, and provide the URL to your Schema Registry: Currently, it is not possible to automatically retrieve the list of Schema Groups, so you should provide the Schema Group concerned. If you need to access more than one Schema Group, you may create one connection per Schema Group.
Then you need to provide the authentication parameters:
Hackolade communicates with the Azure Schema Registry via REST APIs. If you already use Hackolade for Cosmos DB, the following steps may have already been performed. Otherwise, please follow the instructions below:
The Hackolade application must be registered so Azure accepts the REST API calls, as per these instructions. The Application (client) ID and the Directory (tenant) ID are retrieved are retrieved from the App registration Overview screen:
Note: it is critical to assign the proper role to the application just registered. This is done following the steps outlined here.
Finally, the Application secret is obtained from the Certificates & secrets screen of the App registration:
If you don't know how to generate some of the above values, you may want to consult this document.
Pulsar Schema Registry
Give a meaningful name to the connection to identify it for later, choose the Pulsar connection type, provide the URL to your Schema Registry