Dump data form PostgreSQL

Warning

At the moment, YDB's compatibility with PostgreSQL is under development, so not all PostgreSQL constructs and functions are supported yet. PostgreSQL compatibility is available for testing in the form of a Docker container, which can be deployed by following these instructions.

Data from PostgreSQL can be migrated to YDB using utilities such as pg_dump, psql, and YDB CLI. The pg_dump and psql utilities are installed with PostgreSQL. YDB CLI is YDB's command-line client, which is installed separately.

To do this, you need to:

  1. Create a data dump using pg_dump with the following parameters:

    • --inserts — to add data using the INSERT command, instead of the COPY protocol.
    • --column-inserts — to add data using the INSERT command with column names.
    • --rows-per-insert=1000 — to insert data in batches to speed up the process.
    • --encoding=utf_8 — YDB only supports string data in UTF-8.
  2. Convert the dump to a format supported by YDB using the ydb tools pg-convert command from YDB CLI.

  3. Load the result into YDB in PostgreSQL compatibility mode.

pg-convert command

The ydb tools pg-convert command reads a dump file or standard input created by the pg_dump utility, performs transformations, and outputs to standard output a dump that can be sent to YDB's PostgreSQL-compatible middleware.

ydb tools pg-convert performs the following transformations:

  • Moving the creation of the primary key into the body of the CREATE TABLE command.

  • Removing the public schema from table names.

  • Deleting the WITH (...) section in CREATE TABLE.

  • Commenting out unsupported constructs (optionally):

    • SELECT pg_catalog.set_config.*
    • ALTER TABLE

If the CLI cannot find a table's primary key, it will automatically create a BIGSERIAL column named __ydb_stub_id as the primary key.

The general form of the command:

ydb [global options...] tools pg-convert [options...]

subcommand parameters

Name Description
-i The name of the file containing the original dump. If the option is not specified, the dump is read from standard input.
--ignore-unsupported When this option is specified, unsupported constructs will be commented out in the resulting dump and duplicated in standard error. By default, if unsupported constructs are detected, the command returns an error. This does not apply to ALTER TABLE expressions that define a table's primary key, as they are commented out in any case.

Warning

When loading large dumps, reading from standard input is not recommended because the entire dump will be stored in RAM. It is advised to use the file option, in which case the CLI will only keep a small portion of the dump in memory.

Example of importing a dump into YDB

As an example, data generated by pgbench will be loaded.

  1. Start Docker containers with PostgreSQL and YDB:

    docker run --name postgres_container \
        -e POSTGRES_USER=pgroot -e POSTGRES_PASSWORD=1234 \
        -e POSTGRES_DB=local \
        -p 5433:5433 -d postgres:14 -c 'port=5433'
    docker run --name ydb-postgres -d --pull always -p 5432:5432 -p 8765:8765 \
        -e POSTGRES_USER=ydbroot -e POSTGRES_PASSWORD=4321 \
        -e YDB_FEATURE_FLAGS=enable_temp_tables,enable_table_pg_types \
        -e YDB_USE_IN_MEMORY_PDISKS=true \
        ghcr.io/ydb-platform/local-ydb:latest
    

    Information about the started Docker containers:

    Database

    PostgreSQL

    YDB

    Container name

    postgres_container

    ydb-postgres

    Address

    postgres://pgroot:1234@localhost:5433/local

    postgresql://ydbroot:4321@localhost:5432/local

    Port

    5433

    5432

    User name

    pgroot

    ydbroot

    Password

    1234

    4321

  2. Generate data through pgbench:

    docker exec postgres_container pgbench postgres://pgroot:1234@localhost:5433/local -i
    
  3. Create a dump of the database using pg_dump:

    docker exec postgres_container pg_dump postgres://pgroot:1234@localhost:5433/local --inserts \
        --column-inserts --encoding=utf_8 --rows-per-insert=1000 > dump.sql
    
  4. Load the dump into YDB:

    ydb tools pg-convert --ignore-unsupported -i dump.sql | psql postgresql://ydbroot:4321@localhost:5432/local
    

    This command uses YDB CLI to convert the dump.sql file to the format readable by the YDB PostgreSQL compatibility layer. The converted dump file is then redirected to the psql utility for loading the data into YDB via PostgreSQL protocol.