Documentation

No results
    gitHub

    JSON Schema

    From the file system

    If you open a JSON Schema from the file system, Hackolade will detect its nature and will execute the reverse-engineering of that JSON Schema.  You will need to choose a database target for your model.

     

    If you wish to include the schema for a JSON Schema file in an existing model, the process is slightly different.  With your model already opened, choose Tools > Reverse-Engineer > JSON Schema.  

     

    Tools - Reverse-Engineer - JSON Schema

     

     

     

    The structure of a JSON Schema can be imported either as an entity in the Entity Relationship Diagram, or alternatively as a model definition so it could be re-used in the model:

    JSON Schema reverse-engineering dialog

     

    If you wish to force the destination of the reverse-engineering operation, you may specify the container in which the entities should be inserted.

     

    For RDBMS targets, an additional option appears, that allows automatic normalization of complex data types:

     

    JSON Schema reverse-engineering dialog - normalization

     

     

    From cloud storage and schema registries

    JSON files and schemas can also be reverse-engineered from AWS S3, Azure Blob Storage/ADLS, and Google Cloud Storage.

    Cloud Selection - combine Avro schemas

     

    AWS S3

    Give a meaningful name to the connection to identify it for later, and provide proper URI to your S3 bucket, and optional folder path.

     

    Cloud Storage - AWS S3 connection for Avro schema

     

     

    If the S3 bucket is private, you must also provide authentication parameters (Access key id and Secret access key):

    Cloud Storage - AWS S3 authentication avro schema

     

    If you wish to handle AWS authentication through the credentials file, you may leave blank the Access Key ID and Secret Access Key fields, knowing that Hackolade supplies credentials following the recommendations described here.

     

     

    Azure Blob Storage and ADLS

    Give a meaningful name to the connection to identify it for later, and provide proper Container name and Storage account name.

     

    Note: be careful to only mention the storage account name, i.e. NOT a full URL or anything other than the storage account name.

     

    Cloud Storage - Azure connection avro schema

     

    If you wish to filter files, you may enter a file name prefix:

    Cloud Storage - Azure prefix blob name avro schema

     

    Anonymous Authentication

    If the storage is public, you may choose the anonymous method:

    Cloud Storage - Azure anonymous auth

     

    If the storage account is private, Azure provides a choice of 3 methods with different levels of granularity depending on your security requirements:

    Storage Access Key

    The storage access key is similar to a root password for your storage account. Always be careful to protect your access keys. Microsoft recommends that you use Azure Key Vault to manage your access keys, and that you regularly rotate and regenerate your keys.

     

    Cloud Storage - Azure Storage Access Key conf

     

     

    Select the authentication method and paste the key into the Storage Access Key field:

    Cloud Storage - Azure Storage Access Key auth

     

    Shared Access Signature

    A shared access signature (SAS) provides secure delegated access to resources in your storage account, with granular control over how a client can access your data: user delegation, service or account.

     

    For Hackolade to be able to reverse-engineer, the minimum rights are as shown here:

    Cloud Storage - Azure Shared Access Sign conf

     

     

    Cloud Storage - Azure Shared Access Sign gen

     

    After clicking the button to generate, copy the SAS token from the Azure portal, and paste it in the SAS Token field:

    Cloud Storage - Azure SAS Token auth

    Shared Access Token per container

    It is possible to generate tokens for specific containers in the "Shared access tokens" menu option of the container:

    Cloud Storage - Azure Blob SAS Token conf

     

     

    The minimum required rights for our reverse-engineering process to succeed are: read and list.

     

    Cloud Storage - Azure Blob SAS Token gen

     

     

    After clicking the button to generate, copy the Blob SAS token from the Azure portal, and paste it in the SAS Token field:

     

    Cloud Storage - Azure Blob SAS Token auth

     

    Google Cloud Storage

    Give a meaningful name to the connection to identify it for later, and provide proper URI to your GCS bucket, and optional folder path.

    Cloud Storage - Google connection avro schema

     

    If the  bucket is private, you must also access to the Private key:

    Cloud Storage - Google authentication avro schema

     

     

    Confluent Schema Registry on Confluence Cloud

    Confluent Schema Registry on Confluence Cloud

    To connect to your schema registry instance in the cloud you first must obtain both an API key and API secret for it.  They are found in the Schema Registry tab, in the API endpoint section:

    Confluent Schema Registry - API endpoint key

     

     

    Give a meaningful name to the connection to identify it for later, choose Cloud as a source, and provide the URL to your Schema Registry:

     

    Confluent Schema Registry - Cloud connection avro schema

     

    Then provide the API key and API secret:

    Confluence Schema Registry - Cloud auth avro schema

     

    Confluence Schema Registry on-premises

    Give a meaningful name to the connection to identify it for later, choose on-premise as a source, and provide the URL to your Schema Registry:

    Confluent Schema Registry - on-prem connection avro schema

     

    Then provide your username and password:

    Confluent Schema Registry - on-prem auth avro schema

     

    Pulsar Schema Registry

    Give a meaningful name to the connection to identify it for later, choose the Pulsar connection type, provide the URL to your Schema Registry

    Pulsar connection settings avro schema