Documentation

gitHub

dbt sources

In addition to model properties files, Studio can generate dbt sources files. Sources are the raw tables in your warehouse that your dbt models build on top of. Declaring them in a sources.yml file lets dbt track freshness, document lineage, and reference source tables using the {{ source() }} function.

 

To declare a container as a dbt source group, check the dbt source group property in the dbt tab of the container's properties pane. All entities inside that container are then treated as source tables, and forward-engineering produces a sources.yml file instead of a models.yml file.

 

Entities in a source group do not expose the model-level dbt configuration properties (materialized, access, contract, etc.); only source-specific properties are available. All entities of a source group produce a single sources.yml file.

 

Example:

 

version: 2
sources:
  - name: <source_name>
    description: <markdown_string>
    database: <database>
    schema: <schema>
    loaded_at_field: <column_name>
    freshness:
      warn_after: {count: 12, period: hour}
      error_after: {count: 24, period: hour}
    tables:
      - name: <table_name>
        description: <markdown_string>
        identifier: <identifier>
        tags: [<string>]

Source group properties (at Studio container level)

name

The logical identifier for this source group in the dbt project, used as the first argument in {{ source('group_name', 'table_name') }} references. This is not a physical database name; database: and schema: carry the physical location.

 

Studio uses the container's (business) name for this field.

 

description

Taken directly from the container description in Studio. Not configurable in the dbt tab.

 

database

The database in which the source tables live. Leave this empty to let Studio fall back to the database defined at the container level in the model. Only set this explicitly if it needs to differ from the container's defined database.

schema

 

The schema in which the source tables live. Leave this empty to let Studio fall back to the schema defined at the container level in the model. Only set this explicitly if it needs to differ from the container's defined schema.

 

loader

loader: a free text description of the tool or process that loads data into this source (e.g. Fivetran, Stitch). Used for documentation purposes only: dbt does not use this value during runs.

 

loaded_at_field

loaded_at_field: the column used by dbt to determine when the source was last loaded, required for freshness checks. Defined at source group level and inherited by all tables unless overridden at table level.

 

freshness

freshness: defines thresholds for how old source data is allowed to be before dbt raises a warning or an error. Defined at source group level and inherited by all tables unless overridden at table level.

 

sources:
  - name: jaffle_shop
    loaded_at_field: _etl_loaded_at
    freshness:
      warn_after: {count: 12, period: hour}
      error_after: {count: 24, period: hour}

quoting

quoting: controls whether dbt quotes the database, schema, and identifier when generating SQL for source references. Each sub-property accepts true, false, or empty (the dbt default, not exported). 

 

sources:
  - name: jaffle_shop
    quoting:
      database: false
      schema: false
      identifier: true

 

tags

tags: one or more tags to apply to all tables in this source group. Accepts multiple values. Tags defined at source group level are inherited by all tables in the group.

Source table properties (at Studio entity level)

 

name

The name of the source table, taken from the entity name in Hackolade Studio. Used as the second argument in {{ source() }} references.

Note: if a technical name is defined on the entity, see identifier.

 

description

Taken directly from the entity description in Studio. Not configurable in the dbt tab.

 

identifier

identifier: the actual name of the table in the database. This property is not configurable in the dbt tab: Studio takes it automatically from the model. When a technical name is defined in Studio and differs from the (business) name, the (business) name is used as name: (how you reference the source in dbt) and the technical name is exported as identifier: (the physical table name in the database). If only one name is defined, identifier: is omitted and dbt assumes it matches name:.

 

sources:
  - name: jaffle_shop
    tables:
      - name: transaction_records     # business name
        identifier: tr_records_raw    # technical name

 

tags

tags: one or more tags to apply to the source table. Accepts multiple values.

 

sources:
  - name: jaffle_shop
    tables:
      - name: orders
        tags:
          - nightly
          - finance

 

loaded_at_field (override)

When set at table level, overrides the source group's loaded_at_field for this specific table. If left empty, dbt inherits the value from the source group.

 

freshness (override)

When set at table level, overrides the source group's freshness thresholds for this specific table. If left empty, dbt inherits the thresholds from the source group.

 

sources:
  - name: jaffle_shop
    loaded_at_field: _etl_loaded_at
    freshness:
      warn_after: {count: 12, period: hour}
      error_after: {count: 24, period: hour}
    tables:

      - name: orders
        description: One record per order
        tags:
          - nightly
      - name: customers
        description: One record per customer
        identifier: raw_customers
        loaded_at_field: updated_at
        freshness:
          warn_after: {count: 6, period: hour}
          error_after: {count: 12, period: hour}

 

Column names in source tables

Column name: in dbt is simply the physical name of the column. There is no equivalent of identifier: at column level in dbt (this is a known limitation of dbt-core, not a Hackolade choice). 

 

As a result, there is no way to maintain separate business and technical names for columns in the dbt output. Studio exports the technical name if one is defined, otherwise falls back to the (business) name.

 

If your columns have both a business name and a technical name in Studio, be aware that only the technical name will appear in the generated sources.yml.