Documentation

gitHub

dbt schema models

dbt is a SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. It helps teams work directly within the warehouse to produce trusted datasets for reporting, ML modeling, and operational workflows. 

 

dbt performs the T (Transform) of ETL (actually ELT) but it doesn’t offer support for Extraction and Load operations. It allows companies to write transformations as queries and orchestrate them in a more efficient way. 

 

Multiple SQL-like databases are supported currently, including: BigQuery, Databricks, Hive, MySQL, Oracle, PostgreSQL, Redshift, Snowflake, SQL Server, Synapse, and Teradata.

 

For selected targets, Hackolade Studio facilitates the transformation of data in your warehouse with dbt, by letting you generate the model properties schema.yml for tables in your source and target models.  The feature is available in Tools > Forward-Engineering > dbt models

 

dbt models

Warning: the term "model" in dbt terminology is different than for Hackolade.  In dbt <model name> is the technical name of an entity/table/collection/record in Hackolade Studio.

 

Model properties are declared in .yml files generally in a models/ directory (or possibly elsewhere.)  These files can be named whatever_you_want.yml (for us, it will be either the schema's technical name or the entity's technical name, depending on the option chosen.)

 

Example:

version: 2
models:
    - name: <model name>
      description: <markdown_string>
      columns:
        - name: <column_name> # required
          data_type: <string>
          description: <markdown_string>
          constraints:
            - <constraint>
          tags: [<string>]
        - name: ... # declare properties of additional columns
    constraints:
        - <constraint>
          config:
            <model_config>: <config_value>

 

Users are able to choose to have one file per entity or one file per schema, in which case, multiple entities are listed inside the same model file.

 

dbt keywords

There are many keywords in dbt, called properties and configs.  Below is a list of the relevant keywords in the context of Hackolade Studio.  See also this styling guide for dbt models.

 

Name

Name: must be letters, digits and underscores only, and cannot start with a digit

 

name: string

 

We use the technical name if present, otherwise the business name for the object.

 

Description

Description: a user-defined description. Can be used to document a model, and model columns

 

version: 2
models:
    - name: dim_customers
      description: One record per customer
      columns:
        - name: customer_id
          description: Primary key

 

For multi-line description: we use YAML block notation to split a longer description over multiple lines

 


version: 2
models:
  - name: dim_customers
    description: >
      One record per customer. Note that a customer must have made a purchase to
      be included in this <Term id="table" /> — customer accounts that were created but never
      used have been filtered out.
    columns:
      - name: customer_id
        description: Primary key

 

Markdown in description is possible too, but requires to quote description to ensure that the YAML parser doesn't get confused by special characters.

 

version: 2
models:
    - name: dim_customers
      description: "**[Read more](https://www.google.com/)\*\*"\       columns:
        - name: customer_id
        description: Primary key.

 

Columns

Columns: can define sub-properties, including name, description, data_type, constraints.

 

 

Data type

data_type: dbt supports scalar and complex semi-structured data types:

 

- numeric: integer, decimal, float

- string: string, char, varchar, text, character

- date: date, datetime, time, timestamp

- Boolean

- semi-structured: JSON, and array

 

models:
    - name: dim_customers
      columns:
        - name: customer_id
          data_type: int
          constraints:
            - type: not_null
        - name: customer_name
          data_type: string
        - name: non_integer
          data_type: numeric(38,3)

 

Constraints

Constraints: Constraints may be defined for a single column, or at the model level for one or more columns.

If you are defining multiple primary_key constraints for a single model (entity), those must be defined at the model level. Defining multiple primary_key constraints at the column level is not supported.  

 

The structure of a constraint is:

- type (required): one of not_null, unique, primary_key, foreign_key, check, custom

- expression: Free text input to qualify the constraint. Required for certain constraint types, and optional for others.

- name (optional): Human-friendly name for this constraint. Supported by some data platforms.

- columns (model-level only): List of column names to apply the constraint over

 

models:
    - name: <model_name>
      # required
      config
        contract:
        enforced: true

      columns:
        - name: FIRST_COLUMN
          data_type: DATA_TYPE

    # column-level constraints
    constraints:
        - type: not_null
        - type: unique
        - type: foreign_key
          expression: OTHER_MODEL_SCHEMA.OTHER_MODEL_NAME (OTHER_MODEL_COLUMN)
        - type

      # model-level constraints
      constraints:
        - type: primary_key
          columns: [FIRST_COLUMN, SECOND_COLUMN, ...]
        - type: FOREIGN_KEY # multi_column
          columns: [FIRST_COLUMN, SECOND_COLUMN, ...]
          expression: "OTHER_MODEL_SCHEMA.OTHER_MODEL_NAME (OTHER_MODEL_FIRST_COLUMN,OTHER_MODEL_SECOND_COLUMN, ...)"
        - type: check
          columns: [FIRST_COLUMN, SECOND_COLUMN, ...]
          expression: "FIRST_COLUMN != SECOND_COLUMN"
          name: HUMAN_FRIENDLY_NAME
        - type: ...
: ...

 

Config (TBA)

config: allows to configure resources at the same time as properties in YAML files.

 

version: 2
models:
    - name: <model_name>
      config:
        <model_config>: <config_value>
        ...

 

where

version: 2
models:
    - name: [<model-name>]
    config:
        database: <string>
        schema: <string>
        contract: {<dictionary>}

 

There are platform-specific configs for BigQuery, Databricks, Oracle, PostgreSQL, Redshift, Snowflake, SQL Server, Synapse,  and Teradata.  Currently, Hackolade Studio does not generate platform-specific configs.

 

Contract (TBA)

Contract: when the contract configuration is enforced, dbt will ensure that your model's returned dataset exactly matches the attributes you have defined in yaml:

 

- name and data_type for every column

- additional constraints, as supported for this materialization and data platform

 

This is to ensure that the people querying your model downstream -- both inside and outside dbt -- have a predictable and consistent set of columns to use in their analyses. dbt uses built-in type aliasing for the data_type defined in your YAML. For example, you can specify string in your contract, and on Postgres/Redshift, dbt will convert it to text. If dbt doesn't recognize the data_type name among its known aliases, it will pass it through as-is. This is enabled by default, but you can opt-out by setting alias_types to false.

 

models:
    - name: dim_customers
      config:
        materialized: table
      contract:
        enforced: true
        alias_types: false # true by default
      columns:
        - name: customer_id
          data_type: int
          constraints:
            - type: not_null
        - name: customer_name
          data_type: string
        - name: non_integer
          data_type: numeric(38,3)