Documentation

No results

Verify Data Model

Starting with version 6.10.0, Hackolade Studio includes a tool to verify the consistency and quality of data models according to a glossary of class and prime terms, and to target-specific attribute rules such as precision/scale for numeric, and length for string data types.

The model verification is based on:

business names and not target-specific when it comes to rules verified against the glossary
technical names and target-specific when it comes to properties-related rules of attributes. Given the variety of targets supported by Hackolade Studio, and sometimes big differences in data types between these targets, it is necessary to have one file for each target.

Naming theory

The glossary verifications compare the expression of the business names of model objects with entries in a glossary file.

Business names

A business name is an English expression with specific words to describe a single data object (e.g., data model, database, schema, file, table, collection, attribute, field, data element, etc). Each business name comprises one or more optional modifying terms, prime terms, and one class term. These terms are specific to each organization and generally defined by subject matter experts. They should have an agreed upon definition, and are typically maintained in a Glossary.

Business names should be less than eight words. Nouns should be singular and verbs in present tense.

Class terms

The class term is the highest level of qualification and the most important word in a business name. The class term is always the last word of a business name (or in some rare cases a phrase such as "Unit of Measure"). It must be a noun identifying the general purpose of the data object.

Examples:

- specific instance: code, identifier, name, number

- numeric: amount, count, percent

- conditional: indicator, switch

- temporal: date, time, timestamp

- spatial: line, point, polygon

- persistent: constant

- general: document, file, text

Prime terms

A prime term can be a single word ( or a phrase such as 'Unit of Measure'.) It is the most important modifier of the class term, and directly precedes it.

Modifying terms

Modifying terms are used to add important business information to a business name, and precede the prime term.

A glossary loses a significant part of its value for a user if every possible combination of words must be declared in the glossary so model verifications could occur. Therefore the glossary verifications of Hackolade Studio span only prime terms and class terms, and ignore modifiers if they are not themselves declared separately as class terms. Typically, a user will define in the glossary the last 2 terms which are most significant in a name, then allow any kind of declination of these 2 words with other modifiers.

For example, let's say that you defined the Business Name "Currency Code" in the glossary file. The application will allow any combination of terms that is using this term at the end of the expression without requiring for these combinations to be explicitly declared in the glossary file:

Component Invoice Amount Currency Code

Component Total Amount Currency Code

Unit Price Currency Code

Item Converted Price Currency Code

Item Original Price Currency Code

Item Total Converted Price Currency Code

Item Total Original Price Currency Code

etc...

We suggest that you experiment with different combinations in a small glossary file and a small model to first understand the behavior of the application, before embarking on the definition of a large glossary file and applying the verification to large data models.

Glossary verifications

These verifications are generic, i.e. not specific to a target, and are made by checking the business names in the model with the terms in the glossary file. Any exception to an enabled verification rule is logged into a report.

Glossary file

For this step, Hackolade Studio leverage the existing abbreviations .csv file used in Naming Conventions. There's just one optional additional column needed if you want to declare synonyms to be prevented. Terms are checked for each object in the model:

- containers/groups/databases/schemas/namespaces/keyspaces/...

- entities/tables/collections/...

- and mainly for attributes/fields/columns/...

While Naming Conventions parameters are target-specific, the glossary/abbreviations file is typically the same for all targets. It is the other parameters such as case conversion which are target-specific.

The glossary file is made of several columns, with Business Name as a first column, and Technical Name as the second column. Then come Description and Synonyms. Multiple synonyms are separated by a pipe ("|"). Additional columns are ignored by Hackolade. The columns are separated by commas. The first line is the header line, and is ignored in the conversion. The lines are separated by a <CR><LF>. There is no limit to the number of lines in the file. The values in the columns are not case-sensitive.

Example:

Business Name,Technical Name,Description,Synonyms
action,axn,"The accomplishment of a process over a period of time, in stages, or with the possibility of repetition",behavior|conduct
account,acct,,
currency code,curr_cd,,
customer,cust,,
feedback,fdbak,,

Important notes:

the first line in the glossary file is a header line describing the columns in the file
if a cell contains commas, it must be stored in the CSV file in between double quotes
if your file contains diacritics (accents, ...) you must save your file with UTF-8 encoding for the function to work properly. If you edit the file using Excel, you should make sure to save it using this CSV UTF-8 option:
multiple synonyms are separated by a pipe ("|")

Glossary verifications

Current glossary-related rules are:

Invalid Term: the business name contains terms that are not in the glossary: business names must contain only approved terms drawn from the glossary. If a term in the business name does not exist in the glossary, then an entry is created in the report. Remember that a "term" can be either a single word, or a combination of words, e.g. "Unit of Measure" or "Universal Unique Identifier".
Inconsistent Terms: while each individual term in the business name matches terms in the glossary, the combination of (prime term + class term) does not exist in the terms glossary. For example, the business name "document category code" has 3 terms: "document", "category", and "code" which all exist individually in the glossary.
The application checks that "category code" exists in the glossary, and if it doesn't, then an entry is created in the report. In the business name "Item Total Converted Price Currency Code", the class term is "Code", the prime term is "Currency". So the application checks that "Currency Code" exists in the glossary.
Remember that a term could be made of several words, e.g. "Universal Unique Identifier". It counts as just one single term. So in the business name "Commodity Product Description Universal Unique Identifier", the class term is "Universal Unique Identifier", and the prime term is "Description". So the application checks that "Description Universal Unique Identifier" exists in the glossary.
Synonyms: check whether the business name contains synonyms of glossary terms: term names must not contain synonyms of glossary terms. For example, a glossary term "supplier" might have as synonyms "provider", and "seller". If either of these terms are found in business names, then an entry is created in the report.

In other words, if a class term is not in the glossary, it is flagged as invalid. Same for a prime term.

If, for a prime+class combination, they are both declared individually in the glossary but the combination is not in the glossary, then the combination is flagged as inconsistent.

And it is only thru their position in an name that terms are labeled "prime" or "class" in the logic. The combination of words in a name could be such that a certain term is a class term in some names, and a prime term in others.

Attribute properties

All the properties verifications are target-specific.

Hackolade Studio offers predefined, out-of-the-box rules for standard modeling objects such as Containers, Entities, Views, Attributes, and Relationships. These rules can be individually activated or deactivated based on user needs.

In addition, custom verification rules can be defined via configuration files, offering flexibility and control over data quality.

Technical names

Technical names are generated by applying to business names: standard abbreviations, processing of illegal characters, and case conversion. This is an optional feature of Hackolade Studio. It is not required to enable Naming Conventions to use the Model Verification feature. In the absence of a technical name, the application always considers that the technical equals the business name, without need to be declared.

The length of technical names must meet the constraints of the target. Technical names are generated by applying to business names: standard abbreviations, processing of illegal characters, and case conversion. The length of technical names must meet the constraints of the target.

Attribute propRules file

The property rules file is typically stored in the folder:

.hackolade/OPTIONS/<target>

It contains several columns:

- Technical name: [string] the name of physical attributes/columns/fields

- Required: [boolean] (default:false) whether attribute must marked as required (a.k.a. NOT NULL)

- Default: [string] the default value if applicable

- Data Type: [string] the most granular data type for the target. For example if type is numeric and subtype is float, then float should be declared here.

- Length: [number] length of string data types

- Precision: [number] total number of digits in a number, not counting the decimal point

- Scale: [number] total number of digits to the right of the decimal point

Example:

Technical Name,Required,Default,DataType,Length,Precision,Scale
action_cd,true,NO_CHANGE,varchar,10,0,0
adjustment_desc,,,varchar,100,0,0
amt,,,decimal,0,18,3

Out-of-the-box verification rules

The objects Container, Entity, View, Attribute, and the Model itself can be automatically checked for the following standard property-related rules:

check that the properties of each attribute in the model match the specification in the propRules file. An entry is created in the report for each attribute property not matching the declared value in the target-specific propRules file.
- Invalid data type: verify that the column value in the propRules file, if set, matches the property of the attribute in the model (i.e., if not declared in the file, then the value in the model can be anything.) If not, an entry is created in the report.
- Missing required property: verify that the column value in the propRules file, if set, matches the property of the attribute in the model (i.e., if not declared in the file, then the value in the model can be either true or false.) If not, an entry is created in the report.
- Invalid default property: verify that the column value in the propRules file, if set, matches the property of the attribute in the model (i.e., if not declared in the file, then the value in the model can be anything.) If not, an entry is created in the report.
- Invalid length: if a string data type, verify that the column value in the propRules file, if set, matches the property of the attribute in the model (i.e., if not declared in the file, then the value in the model can be anything.) If not, an entry is created in the report.
- Invalid precision: if a numeric data type, verify that the column value in the propRules file, if set, matches the property of the attribute in the model (i.e., if not declared in the file, then the value in the model can be anything.) If not, create an entry in the report.
- Invalid scale: if a numeric data type, verify that the column value in the propRules file, if set, matches the property of the attribute in the model (i.e., if not declared in the file, then the value in the model can be anything.) If not, an entry is created in the report.
check additional rules that are independent of propRules file:
- Business name is not empty
- Technical name is not empty
- Business name does not start with: user-defined string, default: "New"
- Technical name does not start with: user-defined string, default: "New"
- Description is not empty
- Technical name conflicts with reserved keywords: independent of glossary. Each target has a list of reserved words per level. An entry in the report is created if an attribute technical name, in full, is a reserved word.
- Excessive technical name length: independent of glossary. Some target have max length of, for example 128 characters. Some customers may have their own more restrictive standards. An entry in the report is created if an attribute technical name is greater than the user-defined length. We store this user-defined length, per target, for future reuse.

Note that some rules already pro-actively generate warnings in the tool, such as:

duplicate table names in the same schema
duplicate column name in the same table
dangling relationships
mismatched data types in relationships
etc.

So these rules did not get included in the list above.

User-defined regular expression verification rules

You can define your own verification rules for any string property of any modeling object using regular expressions in a CSV configuration file.

This CSV format enables multiple validation rules per property, supporting more complex and nuanced validation scenarios. It ensures data complies with internal standards and naming conventions.

Supported Properties

User-defined verification rules apply to both standard Hackolade properties and custom properties. Currently, only simple scalar properties (at the root level) are supported. Future releases may include complex types, such as block and group properties.

Regular expression regExpRules file

The regular expression rules file is typically located in the following directory:

.hackolade/OPTIONS/<target>

The file contains multiple columns that define the rules and their scope.

Column	Purpose	Allowed values and example
Rule Keyword	A unique identifier for the rule	Any unique string to identify the rule, example: viewNameRule
Rule Name	The name of the rule for the generated report	Any descriptive name for the rule, for example: verification rule on view name
Severity	Indication of the severity level when a rule fails	Allowed values: warning or error
Object	Identifies the nature of the modeling object for the rule	Use the abstract name of the modeling object: model, container, entity, view, relationship, attribute
Property	Specifies the property keyword of the object targeted by the rule	Use the JSON keyword found in the persistence model file. This value is case-sensitive and must match exactly how the property is defined in the model. Otherwise, the rule will not be applied. For example: version, author, description To identify the correct property keyword, you can open a model file in a text editor such as Notepad, and inspect the JSON content
Regular Expression	The pattern with which the property value must match	Any JavaScript regular expression written using the standard format /yourPattern/flags. If the expression is not valid, the application will fallback to a simple string comparison (which may yield incorrect results.) There is no verification of the syntax correctness of the regex provided. If your expression contains a comma, enclose it in double quotes to ensure correct CSV formatting. Line breaks are not allowed. See example patterns below.

Example regExpRules file

Rule keyword;Rule name;Severity;Object;Property;Regular expression
1-model-author-not-empty;Model author is specified;Warning;model;author;/^.+$/
2-model-version-not-empty;Model version is specified;Warning;model;version;/^.+$/
3-table-name-starts-with-T_-prefix;Table technical name starts with T_ prefix;warning;entity;code;/^T_[a-zA-Z0-9_-]+/
4-view-name-starts-with-V_-prefix;View technical name starts with V_ prefix;warning;view;code;/^V_[a-zA-Z0-9_-]+/
5-relationship-name-starts-with-FK_-prefix;Relationship name starts with FK_ prefix;warning;relationship;name;/^FK_[a-zA-Z0-9_-]+/

Common regular expression examples

To assist in creating effective regular expressions, here are some common patterns:

Rule	RegeEx
must be numeric only	/^\d+$/
must start with A (case sensitive)	/^A/
does not start with A (case sensitive)	/^(?!A)/
must start with A (case insensitive)	/^A/i
does not start with A (case insensitive)	/^(?!A)/i
must end with A (case sensitive)	/A$/
does not end with A (case sensitive)	/[^A]$/
must end with A (case insensitive)	/A$/i
does not end with A (case insensitive)	/[^A]$/i
must contain A (case sensitive)	/A/
does not contain A (case sensitive)	/^[^A]$/*
must contain A (case insensitive)	/A/i
does not contain A (case insensitive)	/^[^A]$/i*

Tips for authoring valid JavaScript regular expressions

For help creating more advanced regular expressions, consider using tools like ChatGPT to generate and refine your patterns. To improve the quality and usability of the generated output, make sure to include format requirements in your prompt.

For example:

"Generate a JavaScript regular expression that starts and ends with slashes (e.g., /.../), includes appropriate flags if necessary, and is suitable for insertion into a CSV file (i.e., no line breaks, and wrap the regex in double quotes if it contains a comma). The regex should validate that a name starts with <insert your rule here, for example a single uppercase letter (A-Z)>, followed by one or more alphanumeric characters (a-z, A-Z, or 0-9) and allows dashes and underscores."

You can also use regex101.com to validate your expressions by selecting the JavaScript (ECMAScript) flavor from the menu.

⚠️ Important: Regular expressions are case-sensitive by default. To ensure matches are not affected by letter casing, use the i flag as shown above.

Using the tool to Verify Data Models

A couple of default paths have been added to the user parameters in Tools > Options, for the default path of the Glossary file, and for the default path where the reports should be created:

Verify Data Model - Default paths

The tool can be found in Tools > Verify Data Model

Verify Data Model - menu

When you invoke the tool, you are first presented with a dialog box:

Verify Data Model - main dialog

With this dialog, you may select and deselect rules to be verified for the loaded data model, and you may change the severity. To select a rule or deselect it, you may toggle the checkbox with a click of the mouse. With a click of the mouse on the severity icon, you can change the severity to be listed in the report for each instance of a failed verification.

After you click the Submit button and select the report name and path for storage, the application executes the selected rules, and displays this dialog at the end of the process:

Verify Data Model - open in Excel

Interpreting the exception report

Each failed rule results in an entry in the report. If a single object in the model fails multiple rules, there multiple lines are generated in the report.

When running the verification for the first time, particularly on large models, the report could be quite long. It is strongly suggested to then focus on the biggest issues first, for example wrong data type, before adding other rules like length, precision, and scale. etc. Be aware that some rules depend on other rules. For example if an attribute has data type varchar instead of numeric, then precision and scale would be wrong as well. So it it is probably preferable to fix all the data type first before running the verification again on precision and scale.

There are many columns in the CSV report opened in Excel. The first thing to do is to go to the Data menu tab and click on Filter, then to go the View tab > Freeze Panes > Freeze Top Row. We also suggest to hide the technical columns modelGUID, containerGUID, entityGUID, relationshipGUID, and attribGUID. These columns are only useful for automated processing.

There are different groups of columns in the report:

Verification failure identification: these columns can be used to filter lines and address them in chunks of similar types of failures. They are: verifRuleCode, verifRuleName, verifRuleSeverity
Objects GUIDs: these columns are used for reference to the unique internal keys of objects in Hackolade models. They are: modelGUID, containerGUID, entityGUID, relationshipGUID, and attribGUID, and you should probably hide them to increase legibility of the report.
Objects Business Names and Technical Names: these columns contain the names of the object hierarchy and which might have been used in the glossary verification. They are: modelBusName, modelTechName, containerBusName, containerTechName, entityBusName, entityTechName, relationshipName, attribBusName, attribTechName
Attributes properties: these columns contain the properties verified by the tool. They are: attribDataType, attribLength, attribPrecision, attribScale, attribRequired, attribDefault
Glossary entries: these columns contain the values Business Names in the glossary against which the object name was verified, as well as the corresponding Technical Names, for reference. They are: glossTermBusName, glossTermTechName, glossSysnonyms
Attribute rules values: these columns contain the values for Required, Default, Data Type, Length, Precision and Scale, that are used for verification of the attribute properties. The columns are: rulesRequired, rulesDefault, rulesDataType, rulesLength, rulesPrecision, rulesScale

On this page