Integration of dbt with YDB
Introduction
dbt (data build tool) is a tool for data transformation in analytical data platforms, designed to organize data model development using software engineering practices. It covers the transformation (T) stage in ETL pipelines and enables structured modeling of data logic — from raw data to data marts — with support for testing, documentation, and data lineage, providing end-to-end traceability of data from source to final model.
Models in dbt are defined uniformly as SELECT
statements, simplifying their maintenance and composability. The tool handles their materialization in the target system and tracks dependencies between models, including the evolution of data across transformation stages.
To integrate dbt with YDB, the dbt-ydb connector is used. It enables connecting to YDB as a target platform and supports model materialization as tables and views, incremental data processing, loading test datasets (seeds), as well as running data tests and generating documentation.
This section describes the connector's capabilities and provides steps to set up and start using it.
Warning
The dbt-ydb connector is in the Preview stage and does not currently support all dbt features. The sections below list the supported features and known limitations.
Features
Models and Their Materialization
A core concept in dbt is a data model. A model is essentially a SQL query that can reference any data source in your data warehouse, including other models. The dbt-ydb connector supports the following approaches to materializing models in YDB:
-
View — stored as a YDB view.
-
Table — stored as a table in YDB and re-created by dbt on each model update.
The dbt-ydb connector allows you to specify the following table parameters using model configuration:
Parameter Required Default Value Description primary_key Yes Primary key of the table store_type No row
Table type. row
for row-oriented table orcolumn
for column-oriented tableauto_partitioning_by_size No Automatic partitioning by size auto_partitioning_partition_size_mb No Partition size threshold ttl No Time-To-Live rule Example of a model materialized as a table based on another model (using ref).
Configured with a primary key, TTL, and automatic partitioning by size.{{ config( primary_key='id, created_at', store_type='row', auto_partitioning_by_size='ENABLED', auto_partitioning_partition_size_mb=256, ttl='Interval("P30D") on created_at' ) }} select id, name, created_at from {{ ref('source_table') }}
-
Incremental model — created as a table inside YDB, but instead of being recreated, it is updated with changed and new rows when refreshed by dbt.
The dbt-ydb connector supports the same parameters as for table materialization, plus unique parameters for the incremental model:
Parameter Required Default Value Description incremental_strategy No MERGE
Incremental materialization strategy. The MERGE
strategy is supported, using the YDBUPSERT
operation.APPEND
strategy support is under development.
Note
Another materialization type, ephemeral model, is not currently supported by the connector.
Snapshots
The snapshot mechanism is not currently supported by dbt-ydb.
Seeds (CSV-based reference/test data)
The dbt-ydb connector supports dbt seeds to upload reference and test data from CSV files into your project and use them in other models.
Data Testing
dbt-ydb supports standard dbt data tests, as well as singular tests within the capabilities of YQL.
Documentation Generation
dbt-ydb supports generating documentation from dbt projects for YDB.
Getting Started
Requirements
To start working with dbt on YDB, you will need:
- Python 3.10+;
- dbt Core 1.8+;
- An existing YDB cluster (a single-node installation from the quickstart is sufficient).
Note
dbt Fusion 2.0 is not supported at this time.
Installation
To install dbt-ydb, run:
pip install dbt-ydb
Connecting dbt to a YDB Cluster
dbt connects to YDB via the dbt-ydb connector using the standard way for YDB. To connect successfully, specify the endpoint, database path, and authentication parameters in the dbt profiles file.
Example profile file with possible authentication options and default values (in square brackets):
profile_name:
target: dev
outputs:
dev:
type: ydb
host: [localhost] # YDB host
port: [2136] # YDB port
database: [/local] # YDB database
schema: [<empty string>] # Optional subfolder for DBT models
secure: [False] # If enabled, grpcs protocol will be used
root_certificates_path: [<empty string>] # Optional path to root certificates file
# Static Credentials
username: [<empty string>]
password: [<empty string>]
# Access Token Credentials
token: [<empty string>]
# Service Account Credentials
service_account_credentials_file: [<empty string>]
Creating a Project from Scratch via dbt init
-
Initialize a project:
dbt init
-
Follow dbt’s interactive prompts to select the dbt-ydb connector and authentication settings for your YDB cluster.
-
As a result, your project directory will be created along with a dbt profiles file in your home directory, containing a new connection to YDB:
~/.dbt/profiles.yml
-
Run
dbt debug
to verify the connection:dbt debug
-
Inside your project directory, you will find the following structure:
-
Adapt the model
my_first_dbt_model
.Currently, dbt does not support customizing the auto-generated example per connector. Therefore, to run this model with dbt-ydb, you need to update it as follows:
/* Welcome to your first dbt model! Did you know that you can also configure models directly within SQL files? This will override configurations stated in dbt_project.yml Try changing "table" to "view" below */ {{ config(materialized='table', primary_key='id') }} select * from ( select 1 as id union all select null as id )
-
Now you can run your project:
dbt run
Running the Example Project
The dbt-ydb connector comes with an example you can use to quickly test dbt functionality with YDB.
-
Clone the repository:
git clone https://github.com/ydb-platform/dbt-ydb.git cd dbt-ydb/examples/jaffle_shop
-
Configure the connection profile in the
profiles.yml
file. For a single-node installation from the quickstart, the file should look like this:profile_name: target: dev outputs: dev: type: ydb host: localhost # YDB host port: 2136 # YDB port database: /local # YDB database schema: jaffle_shop
-
Verify the connection:
dbt debug
-
Load test data (via seeds):
This command will load CSV files from
data/
intoraw_*
tables in YDB.dbt seed
-
Run models:
This will create tables and views based on the project’s example models.
dbt run
-
Test model data:
This will run standard data tests described in the example — such as checks for
null
, allowed values lists, and others.dbt test
-
Generate documentation and start a local web server to view it:
The project documentation will be available in your browser at http://localhost:8080.
dbt docs generate dbt docs serve --port 8080
Next Steps
You can find the official dbt documentation here.
Additionally, you can explore the connector’s source code and contribute to its development in the public dbt-ydb repository on GitHub.