Integration of dbt with YDB

Introduction

dbt (data build tool) is a tool for data transformation in analytical data platforms, designed to organize data model development using software engineering practices. It covers the transformation (T) stage in ETL pipelines and enables structured modeling of data logic — from raw data to data marts — with support for testing, documentation, and data lineage, providing end-to-end traceability of data from source to final model.

Models in dbt are defined uniformly as SELECT statements, simplifying their maintenance and composability. The tool handles their materialization in the target system and tracks dependencies between models, including the evolution of data across transformation stages.

To integrate dbt with YDB, the dbt-ydb connector is used. It enables connecting to YDB as a target platform and supports model materialization as tables and views, incremental data processing, loading test datasets (seeds), as well as running data tests and generating documentation.

This section describes the connector's capabilities and provides steps to set up and start using it.

Warning

The dbt-ydb connector is in the Preview stage and does not currently support all dbt features. The sections below list the supported features and known limitations.

Features

Models and Their Materialization

A core concept in dbt is a data model. A model is essentially a SQL query that can reference any data source in your data warehouse, including other models. The dbt-ydb connector supports the following approaches to materializing models in YDB:

  1. View — stored as a YDB view.

  2. Table — stored as a table in YDB and re-created by dbt on each model update.

    The dbt-ydb connector allows you to specify the following table parameters using model configuration:

    Parameter Required Default Value Description
    primary_key Yes Primary key of the table
    store_type No row Table type. row for row-oriented table or column for column-oriented table
    auto_partitioning_by_size No Automatic partitioning by size
    auto_partitioning_partition_size_mb No Partition size threshold
    ttl No Time-To-Live rule

    Example of a model materialized as a table based on another model (using ref).
    Configured with a primary key, TTL, and automatic partitioning by size.

    {{ config(
       primary_key='id, created_at',
       store_type='row',
       auto_partitioning_by_size='ENABLED',
       auto_partitioning_partition_size_mb=256,
       ttl='Interval("P30D") on created_at'
    ) }}
    
    select
       id,
       name,
       created_at
    from {{ ref('source_table') }}
    
  3. Incremental model — created as a table inside YDB, but instead of being recreated, it is updated with changed and new rows when refreshed by dbt.

    The dbt-ydb connector supports the same parameters as for table materialization, plus unique parameters for the incremental model:

    Parameter Required Default Value Description
    incremental_strategy No MERGE Incremental materialization strategy. The MERGE strategy is supported, using the YDB UPSERT operation. APPEND strategy support is under development.

Note

Another materialization type, ephemeral model, is not currently supported by the connector.

Snapshots

The snapshot mechanism is not currently supported by dbt-ydb.

Seeds (CSV-based reference/test data)

The dbt-ydb connector supports dbt seeds to upload reference and test data from CSV files into your project and use them in other models.

Data Testing

dbt-ydb supports standard dbt data tests, as well as singular tests within the capabilities of YQL.

Documentation Generation

dbt-ydb supports generating documentation from dbt projects for YDB.

Getting Started

Requirements

To start working with dbt on YDB, you will need:

  • Python 3.10+;
  • dbt Core 1.8+;
  • An existing YDB cluster (a single-node installation from the quickstart is sufficient).

Note

dbt Fusion 2.0 is not supported at this time.

Installation

To install dbt-ydb, run:

pip install dbt-ydb

Connecting dbt to a YDB Cluster

dbt connects to YDB via the dbt-ydb connector using the standard way for YDB. To connect successfully, specify the endpoint, database path, and authentication parameters in the dbt profiles file.

Example profile file with possible authentication options and default values (in square brackets):

profile_name:
target: dev
outputs:
   dev:
      type: ydb
      host: [localhost] # YDB host
      port: [2136] # YDB port
      database: [/local] # YDB database
      schema: [<empty string>] # Optional subfolder for DBT models
      secure: [False] # If enabled, grpcs protocol will be used
      root_certificates_path: [<empty string>] # Optional path to root certificates file

      # Static Credentials
      username: [<empty string>]
      password: [<empty string>]

      # Access Token Credentials
      token: [<empty string>]

      # Service Account Credentials
      service_account_credentials_file: [<empty string>]

Creating a Project from Scratch via dbt init

  1. Initialize a project:

    dbt init
    
  2. Follow dbt’s interactive prompts to select the dbt-ydb connector and authentication settings for your YDB cluster.

  3. As a result, your project directory will be created along with a dbt profiles file in your home directory, containing a new connection to YDB:

    ~/.dbt/profiles.yml
    
  4. Run dbt debug to verify the connection:

    dbt debug
    
  5. Inside your project directory, you will find the following structure:

    dbt-ydb new project structure

  6. Adapt the model my_first_dbt_model.

    Currently, dbt does not support customizing the auto-generated example per connector. Therefore, to run this model with dbt-ydb, you need to update it as follows:

    /*
       Welcome to your first dbt model!
       Did you know that you can also configure models directly within SQL files?
       This will override configurations stated in dbt_project.yml
    
       Try changing "table" to "view" below
    */
    
    {{ config(materialized='table', primary_key='id') }}
    
    select *
    from (
       select 1 as id
       union all
       select null as id
    )
    
  7. Now you can run your project:

    dbt run
    

Running the Example Project

The dbt-ydb connector comes with an example you can use to quickly test dbt functionality with YDB.

  1. Clone the repository:

    git clone https://github.com/ydb-platform/dbt-ydb.git
    cd dbt-ydb/examples/jaffle_shop
    
  2. Configure the connection profile in the profiles.yml file. For a single-node installation from the quickstart, the file should look like this:

    profile_name:
       target: dev
       outputs:
          dev:
          type: ydb
          host: localhost # YDB host
          port: 2136 # YDB port
          database: /local # YDB database
          schema: jaffle_shop
    
  3. Verify the connection:

    dbt debug
    
  4. Load test data (via seeds):

    This command will load CSV files from data/ into raw_* tables in YDB.

    dbt seed
    
  5. Run models:

    This will create tables and views based on the project’s example models.

    dbt run
    
  6. Test model data:

    This will run standard data tests described in the example — such as checks for null, allowed values lists, and others.

    dbt test
    
  7. Generate documentation and start a local web server to view it:

    The project documentation will be available in your browser at http://localhost:8080.

    dbt docs generate
    dbt docs serve --port 8080
    

Next Steps

You can find the official dbt documentation here.
Additionally, you can explore the connector’s source code and contribute to its development in the public dbt-ydb repository on GitHub.