Documentation

gitHub

Collibra Data Dictionary integration

Note: You may wish to view the how-to video on this subject.

 

Collibra is one of the leaders in the space of data governance and metadata management solutions.  Metadata management is a core aspect of an organization’s ability to manage its data and information assets. The term “metadata” describes the various facets of an information asset that can improve its usability throughout its life cycle. Metadata is used as a reference for business-oriented and technical projects, and lays the foundations for describing, inventorying and understanding data for multiple use cases.

Hackolade has partnered with Collibra to provide an officially supported integration with Collibra's Data Dictionary, using its Core and Import APIs.  With this integration, users can easily publish into Collibra domains their Hackolade data models for any of the many targets supported by Hackolade.  The process automatically:

  • checks the configuration in Collibra
  • creates the necessary custom scopes, attributeTypes and assignments to support the granularity of Hackolade features
  • then creates and keeps in sync assets for schemas, tables, views, columns, and foreign key relationships.

The integration specifically handles complex data types, hierarchical structures, and polymorphism found in modern databases, JSON, Avro, Parquet, ProtoBuf, etc...

 

Important note: the Collibra integration is an add-on feature which requires a specific license key which can be purchased from us here.

Integration process flow

To forward a Hackolade data model to you Collibra Data Dictionary, you choose Tools > Forward-Engineering > Collibra Dictionary.  

The diagram below describes the integration flow:

Image

 

Connect to your Collibra instance

In order to feed data model information to the Collibra instance, it is assumed that you have sufficient credentials to do so.  If not, please contact your Collibra administrator.

 

To connect to the Collibra instance, you must first specify connection settings:

Image

as well as authentication credentials:

Image

 

Check for the proper configuration

To ensure successful processing of the Hackolade model information, the system uses the Core API to check that the Collibra setup is OK, and if not, asks the user for permission to create the necessary setup.

 

The system will:

1) confirm that the out-of-the-box assetTypes exist: schema, table, database view, column, foreign key, mapping specification

2) confirm that the out-of-the-box relationTypes exist: schema contains table, and table contains column

3) confirm that the Hackolade setup exists:

- custom attribute scope

- custom attributeTypes to handle Hackolade-specific information

- custom assignments of attributeTypes to out-of-the-box assetTypes: schema, tables, database views, columns

- custom characteristics for schema, table, columns, and database views

- custom relationType "Column contains Column" to allow hierarchical view of nested objects, arrays, and polymorphism

 

If the expected configuration cannot be found in Collibra, the user is prompted for confirmation that the setup should be automatically carried out in the Collibra instance.

Fetch existing Communities and Domains

If the configuration is correct, the application uses the Core API to retrieve the existing Communities and Domains and display them so the user can select where the Hackolade Data Model should be loaded.  If the domain does not exist, it should be created first.  It is recommended to create a new domain with type "Physical Data Dictionary".

Image

Select the target domain

All communities and domains are displayed in the box below so the user can select the one where the Hackolade data model should be loaded:

Image

 

Select the data model objects to be loaded

The user then selects the entities to be loaded to the selected Collibra domains:

Image

 

Load data model information into Collibra

The application uses the Import API to bulk load the selected objects metadata and Entity-Relationship picture into the selected Collibra domain.  The system leverages the Synchronization API to keep data in Collibra up-to-date with model evolutions when invoking the integration repeatedly.  The synchronization is based on the internal model UUID.

View data model in Collibra console

The data model information can immediately be viewed inside Collibra:

Image

 

In order to view nested objects in the above screen, it is suggested to enable multipath hierarchy for the relation types: schema contains table, table contains column, and column contains column:

Image

 

You may also display the Full Name field to view the nesting path in dot.notation:

Image

 

Users will notice that the data types of the specific target technology:

Image

 

The Entity-Relationship Diagram image can also be viewed as a PNG file:

Image