Collibra Data Dictionary integration
One of the primary challenges severely constraining organizations is to make business sense of technical data structures in applications and databases. This complicates the ability of organizations to identify critical data elements and bring them under governance.
The integration of data modeling with governance tools and processes enables solving this problem at the source, i.e. where data structures are designed to create schemas and their technical metadata.
Note: You may wish to view the how-to video on this subject.
Collibra is one of the leaders in the space of data governance and metadata management solutions. Metadata management is a core aspect of an organization’s ability to manage its data and information assets. The term “metadata” describes the various facets of an information asset that can improve its usability throughout its life cycle. Metadata is used as a reference for business-oriented and technical projects, and lays the foundations for describing, inventorying and understanding data for multiple use cases.
Hackolade has partnered with Collibra to provide an officially supported integration with Collibra's Data Dictionary, using its Core and Import APIs. With this integration, users can easily publish into Collibra domains, and keep synchronized, their Hackolade data models for any of the many targets supported by Hackolade. Even the schema definitions of REST APIs documented in Swagger or OpenAPI.
The process automatically:
- checks the configuration in Collibra
- creates the necessary custom scopes, attributeTypes and assignments to support the granularity of Hackolade features
- then creates and keeps in sync assets for schemas, tables, views, columns, and foreign key relationships.
The integration specifically handles complex data types, hierarchical structures, and polymorphism found in modern databases, JSON, Avro, Parquet, ProtoBuf, etc... Custom properties defined for a plugin are also published as custom attributeTypes in Collibra.
Important note: the Collibra integration is an add-on feature which requires a specific license key which can be purchased from us here.
Publishing process flow
To publish a Hackolade data model to your Collibra Data Dictionary, you choose Tools > Forward-Engineering > Collibra Dictionary.
The diagram below describes the integration flow:
Connect to your Collibra instance
In order to feed data model information to the Collibra instance, it is assumed that you have sufficient credentials to do so. If not, please contact your Collibra administrator.
To connect to the Collibra instance, you must first specify connection settings:
as well as authentication credentials:
To successfully import a Hackolade model into Collibra, a user should have the author's license type. The role that is assigned to the user should have been provided with the following permissions:
- For global role and permissions:
- System administration - (This is necessary to apply the custom Hackolade configuration: attributeTypes, relationTypes, scope...)
- For resource role and permissions:
- Domain: (This is necessary for the future enhancement of views and work with Hackolade Mapping Domain)
We also recommend assigning a user with the above permissions to the parent community of the target domain. It will be needed to create/update/delete Hackolade Mapping Domain.
Check for the proper configuration
To ensure successful processing of the Hackolade model information, the system uses the Core API to check that the Collibra setup is OK, and if not, asks the user for permission to create the necessary setup.
The system will:
1) confirm that the out-of-the-box assetTypes exist: schema, table, database view, column, foreign key, mapping specification
2) confirm that the out-of-the-box relationTypes exist: schema contains table, and table contains column
3) confirm that the Hackolade setup exists:
- custom attribute scope
- custom attributeTypes to handle Hackolade-specific information
- custom assignments of attributeTypes to out-of-the-box assetTypes: schema, tables, database views, columns
- custom characteristics for schema, table, columns, and database views
- custom relationType "Column contains Column" to allow hierarchical view of nested objects, arrays, and polymorphism
If the expected configuration cannot be found in Collibra, the user is prompted for confirmation that the setup should be automatically carried out in the Collibra instance.
Fetch existing Communities and Domains
If the configuration is correct, the application uses the Core API to retrieve the existing Communities and Domains and display them so the user can select where the Hackolade Data Model should be loaded. If the domain does not exist, it should be created first. It is recommended to create a new domain with type "Physical Data Dictionary".
Select the target domain
All communities and domains are displayed in the box below so the user can select the one where the Hackolade data model should be loaded:
Select the data model objects to be loaded
The user then selects the entities to be loaded to the selected Collibra domains:
Publish data model to Collibra
The application uses the Import API to bulk load the selected objects metadata and Entity-Relationship picture into the selected Collibra domain. The system leverages the Synchronization API to keep data in Collibra up-to-date with model evolutions when invoking the integration repeatedly. The synchronization is based on the internal model UUID.
Important note: According the Collibra documentation, depending on the resource type, the Import API performs one of two operations: SET/REPLACE or MERGE. For attributes, the operation is SET/REPLACE. As a result, "if the resource exists with properties other than the ones defined in the input (i.e.; the Hackolade data model), the resource is replaced with the one provided in the input." Meaning that edits made in Collibra risk disappearing with subsequent publications from Hackolade. With the ability to reverse-engineer from Collibra into Hackolade, 2 approaches are possible:
1) reverse-engineer from Collibra into the master Hackolade data model, and let the conflict resolution kick-in, letting the user decide whether to merge the information from Collibra.
Once the information is merged into the hackolade model, the whole model can be published to Collibra again.
2) a user might want to have more control over the granularity of what gets merged and what does not. There is the possibility to reverse-engineer into an empty model, save it, then do a Model Compare & Merge with the master Hackolade model, followed by the publishing back to Collibra of the merged model.
View data model in Collibra console
The data model information can immediately be viewed inside Collibra:
In order to view nested objects in the above screen, it is suggested to enable multipath hierarchy for the relation types: schema contains table, table contains column, and column contains column:
You may also display the Full Name field to view the nesting path in dot.notation, as well as the hackolade Data Type:
Users will notice that the data types of the specific target technology:
The Entity-Relationship Diagram image can also be viewed as a PNG file:
Reverse-engineer a Collibra Data Dictionary
With v5.2.1, we introduced the possibility to reverse-engineer a Collibra physical data dictionary into a Hackolade data model for the target of your choice. This operation can be done:
- either into an empty model you wish to create,
- or into an existing model, possibly the one used to originally publish to Collibra. This is particularly handy if maintenance occurs in Collibra for models created in Hackolade Studio. Refer to the important note above in the section "Publish data model to Collibra".