Collibra Data Dictionary integration
One of the primary challenges severely constraining organizations is to make business sense of technical data structures in applications and databases. This complicates the ability of organizations to identify critical data elements and bring them under governance.
The integration of data modeling with governance tools and processes enables solving this problem at the source, i.e. where data structures are designed to create schemas and their technical metadata.
Note: You may wish to view the how-to video on this subject.
Important note: the Collibra integration is an add-on feature which requires a specific license key which can be purchased from us here.
Collibra is one of the leaders in the space of data governance and metadata management solutions. Metadata management is a core aspect of an organization’s ability to manage its data and information assets. The term “metadata” describes the various facets of an information asset that can improve its usability throughout its life cycle. Metadata is used as a reference for business-oriented and technical projects, and lays the foundations for describing, inventorying and understanding data for multiple use cases.
Hackolade has partnered with Collibra to provide an officially supported integration with Collibra's Data Dictionary, using its Core, Import, and output module APIs. With this integration, users can easily publish into Collibra domains, and keep synchronized, their Hackolade data models for any of the many targets supported by Hackolade. Even the schema definitions of REST APIs documented in Swagger or OpenAPI.
The process automatically:
- checks the configuration in Collibra
- creates the necessary custom scopes, attributeTypes and assignments to support the granularity of Hackolade features
- then creates and keeps in sync assets for schemas, tables, views, columns, models, entities, attributes, and foreign key relationships.
The integration specifically handles complex data types, hierarchical structures, and polymorphism found in modern databases, JSON, Avro, Parquet, ProtoBuf, etc... Custom properties defined for a plugin are also published as custom attributeTypes in Collibra.
Hackolade Studio data models for physical targets are published to Physical Data Dictionaries in the form of schemas/tables/columns assets in Collibra, whereas since v7.3.1 of Hackolade Studio, Polyglot models are published to Logical Data Dictionaries in the form of models/entities/attributes assets in Collibra.
With v7.6.1 of Hackolade Studio, we added publishing of lineage relations between logical Polyglot models and their derived physical targets for all their assets (model/schema, entity/table, attribute/column)
Publishing process flow
To publish a Hackolade data model to your Collibra Data Dictionary, you choose Tools > Forward-Engineering > Collibra Dictionary.
The diagram below describes the integration flow:
Connect to your Collibra instance
See mode details in this page.
Check for the proper configuration
To ensure successful processing of the Hackolade model information, the system uses the Core API to check that the Collibra setup is OK, and if not, asks the user for permission to create the necessary setup.
The system will:
1) confirm that the out-of-the-box assetTypes exist: model, entity, attribute, schema, table, database view, column, foreign key, mapping specification
2) confirm that the out-of-the-box relationTypes exist: schema contains table, and table contains column
3) confirm that the Hackolade setup exists:
- custom attribute scope
- custom attributeTypes to handle Hackolade-specific information
- custom assignments of attributeTypes to out-of-the-box assetTypes: model, entity, attribute, schema, tables, database views, columns
- custom characteristics for schema, table, columns, and database views
- custom relationType "Column contains Column" to allow hierarchical view of nested objects, arrays, and polymorphism
If the expected configuration cannot be found in Collibra, the user is prompted for confirmation that the setup should be automatically carried out in the Collibra instance.
Fetch existing Communities and Domains
If the configuration is correct, the application uses the Core API to retrieve the existing Communities and Domains and display them so the user can select where the Hackolade Data Model should be loaded. If the domain does not exist, it should be created first. It is recommended to create a new domain with type "Physical Data Dictionary" for physical models of Hackolade Studio, and "Logical Data Dictionary" for polyglot models.
Select the target domain
All communities and domains are displayed in the box below so the user can select the one where the Hackolade data model should be loaded:
Select the data model objects to be loaded
The user then selects the entities to be loaded to the selected Collibra domains:
Publish data model to Collibra
The application uses the Import API to bulk load the selected objects metadata and Entity-Relationship picture into the selected Collibra domain. The system leverages the Synchronization API to keep data in Collibra up-to-date with model evolutions when invoking the integration repeatedly. The synchronization is based on the internal model UUID.
Important note: According the Collibra documentation, depending on the resource type, the Import API performs one of two operations: SET/REPLACE or MERGE. For attributes, the operation is SET/REPLACE. As a result, "if the resource exists with properties other than the ones defined in the input (i.e.; the Hackolade data model), the resource is replaced with the one provided in the input." Meaning that edits made in Collibra risk disappearing with subsequent publications from Hackolade. With the ability to reverse-engineer from Collibra into Hackolade, 2 approaches are possible:
1) reverse-engineer from Collibra into the master Hackolade data model, and let the conflict resolution kick-in, letting the user decide whether to merge the information from Collibra.
Once the information is merged into the hackolade model, the whole model can be published to Collibra again.
2) a user might want to have more control over the granularity of what gets merged and what does not. There is the possibility to reverse-engineer into an empty model, save it, then do a Model Compare & Merge with the master Hackolade model, followed by the publishing back to Collibra of the merged model.
View data model in Collibra console
The data model information can immediately be viewed inside Collibra:
In order to view nested objects in the above screen, it is suggested to enable multipath hierarchy for the relation types: schema contains table, table contains column, and column contains column:
You may also display the Full Name field to view the nesting path in dot.notation, as well as the hackolade Data Type:
Users will notice that the data types of the specific target technology:
The Entity-Relationship Diagram image can also be viewed as a PNG file:
Reverse-engineer a Collibra Data Dictionary
With v5.2.1, we introduced the possibility to reverse-engineer a Collibra physical data dictionary into a Hackolade data model for the target of your choice. This operation can be done:
- either into an empty model you wish to create,
- or into an existing model, possibly the one used to originally publish to Collibra. This is particularly handy if maintenance occurs in Collibra for models created in Hackolade Studio. Refer to the important note above in the section "Publish data model to Collibra".
Publish lineage relations
With v7.6.1 of Hackolade Studio, it is now possible to publish lineage relations between logical Polyglot models and their derived physical targets for all their assets (model/schema, entity/table, attribute/column)
This fetures is in support of the Guided Stewardship operating model of Collibra:
This process requires the orchestration of several successive operations:
- publish to Collibra the Polyglot model(s) from which physical target models are derived in Hackolade. Each Polyglot model is typically published into a Collibra Logical Data Dictionary with the models/data entitty/data attribute structure (possibly specifying the hierarchy "Data Attribute contains Data Attribute) ;
- make sure to save the model(s) in Hackolade Studio, so the Collibra internal IDs are persisted in the Polyglot model(s);
- open the derived model(s) in Hackolade Studio and make sure to refresh the references to parent Polyglot model(s), which will ensure that the links between objects are persisted in the derived model(s);
- publish the derived model(s) to Collibra, generally into a Collibra Physical Data Dictionary (unless the derived model is itself a Polyglot model, in which case it would be published to a Logical Data Dictionary.)
In Collibra, it is then possible to display the lineage relations automatically created by Hackolade Studio during publishing.