dbt sources
In addition to model properties files, Studio can generate dbt sources files. Sources are the raw tables in your warehouse that your dbt models build on top of. Declaring them in a sources.yml file lets dbt track freshness, document lineage, and reference source tables using the {{ source() }} function.
To declare a container as a dbt source group, check the dbt source group property in the dbt tab of the container's properties pane. All entities inside that container are then treated as source tables, and forward-engineering produces a sources.yml file instead of a models.yml file.
Entities in a source group do not expose the model-level dbt configuration properties (materialized, access, contract, etc.); only source-specific properties are available. All entities of a source group produce a single sources.yml file.
Example:
version: 2
sources:
- name: <source_name>
description: <markdown_string>
database: <database>
schema: <schema>
loaded_at_field: <column_name>
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
tables:
- name: <table_name>
description: <markdown_string>
identifier: <identifier>
tags: [<string>]
Source group properties (at Studio container level)
name
The logical identifier for this source group in the dbt project, used as the first argument in {{ source('group_name', 'table_name') }} references. This is not a physical database name; database: and schema: carry the physical location.
Studio uses the container's (business) name for this field.
description
Taken directly from the container description in Studio. Not configurable in the dbt tab.
database
The database in which the source tables live. Leave this empty to let Studio fall back to the database defined at the container level in the model. Only set this explicitly if it needs to differ from the container's defined database.
schema
The schema in which the source tables live. Leave this empty to let Studio fall back to the schema defined at the container level in the model. Only set this explicitly if it needs to differ from the container's defined schema.
loader
loader: a free text description of the tool or process that loads data into this source (e.g. Fivetran, Stitch). Used for documentation purposes only: dbt does not use this value during runs.
loaded_at_field
loaded_at_field: the column used by dbt to determine when the source was last loaded, required for freshness checks. Defined at source group level and inherited by all tables unless overridden at table level.
freshness
freshness: defines thresholds for how old source data is allowed to be before dbt raises a warning or an error. Defined at source group level and inherited by all tables unless overridden at table level.
sources:
- name: jaffle_shop
loaded_at_field: _etl_loaded_at
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
quoting
quoting: controls whether dbt quotes the database, schema, and identifier when generating SQL for source references. Each sub-property accepts true, false, or empty (the dbt default, not exported).
sources:
- name: jaffle_shop
quoting:
database: false
schema: false
identifier: true
tags
tags: one or more tags to apply to all tables in this source group. Accepts multiple values. Tags defined at source group level are inherited by all tables in the group.
Source table properties (at Studio entity level)
name
The name of the source table, taken from the entity name in Hackolade Studio. Used as the second argument in {{ source() }} references.
Note: if a technical name is defined on the entity, see identifier.
description
Taken directly from the entity description in Studio. Not configurable in the dbt tab.
identifier
identifier: the actual name of the table in the database. This property is not configurable in the dbt tab: Studio takes it automatically from the model. When a technical name is defined in Studio and differs from the (business) name, the (business) name is used as name: (how you reference the source in dbt) and the technical name is exported as identifier: (the physical table name in the database). If only one name is defined, identifier: is omitted and dbt assumes it matches name:.
sources:
- name: jaffle_shop
tables:
- name: transaction_records # business name
identifier: tr_records_raw # technical name
tags
tags: one or more tags to apply to the source table. Accepts multiple values.
sources:
- name: jaffle_shop
tables:
- name: orders
tags:
- nightly
- finance
loaded_at_field (override)
When set at table level, overrides the source group's loaded_at_field for this specific table. If left empty, dbt inherits the value from the source group.
freshness (override)
When set at table level, overrides the source group's freshness thresholds for this specific table. If left empty, dbt inherits the thresholds from the source group.
sources:
- name: jaffle_shop
loaded_at_field: _etl_loaded_at
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
tables:
- name: orders
description: One record per order
tags:
- nightly
- name: customers
description: One record per customer
identifier: raw_customers
loaded_at_field: updated_at
freshness:
warn_after: {count: 6, period: hour}
error_after: {count: 12, period: hour}
Column names in source tables
Column name: in dbt is simply the physical name of the column. There is no equivalent of identifier: at column level in dbt (this is a known limitation of dbt-core, not a Hackolade choice).
As a result, there is no way to maintain separate business and technical names for columns in the dbt output. Studio exports the technical name if one is defined, otherwise falls back to the (business) name.
If your columns have both a business name and a technical name in Studio, be aware that only the technical name will appear in the generated sources.yml.