Dump data form PostgreSQL

Data from PostgreSQL can be migrated to YDB using utilities such as pg_dump, psql, and YDB CLI. The pg_dump and psql utilities are installed with PostgreSQL. YDB CLI is YDB's command-line client, which is installed separately.

To do this, you need to:

  1. Create a data dump using pg_dump with the following parameters:
    • --inserts — to add data using the INSERT command, instead of the COPY protocol.
    • --column-inserts — to add data using the INSERT command with column names.
    • --rows-per-insert=1000 — to insert data in batches to speed up the process.
    • --encoding=utf_8 — YDB only supports string data in UTF-8.
  2. Convert the dump to a format supported by YDB using the ydb tools pg-convert command from YDB CLI.
  3. Load the result into YDB in PostgreSQL compatibility mode.

pg-convert command

The ydb tools pg-convert command reads a dump file or standard input created by the pg_dump utility, performs transformations, and outputs to standard output a dump that can be sent to YDB's PostgreSQL-compatible middleware.

ydb tools pg-convert performs the following transformations:

  • Moving the creation of the primary key into the body of the CREATE TABLE command.
  • Removing the public schema from table names.
  • Deleting the WITH (...) section in CREATE TABLE.
  • Commenting out unsupported constructs (optionally):
    • SELECT pg_catalog.set_config.*
    • ALTER TABLE

If the CLI cannot find a table's primary key, it will automatically create a BIGSERIAL column named __ydb_stub_id as the primary key.

The general form of the command:

ydb [global options...] tools pg-convert [options...]

subcommand parameters

Name Description
-i The name of the file containing the original dump. If the option is not specified, the dump is read from standard input.
--ignore-unsupported When this option is specified, unsupported constructs will be commented out in the resulting dump and duplicated in standard error. By default, if unsupported constructs are detected, the command returns an error. This does not apply to ALTER TABLE expressions that define a table's primary key, as they are commented out in any case.

Warning

When loading large dumps, reading from standard input is not recommended because the entire dump will be stored in RAM. It is advised to use the file option, in which case the CLI will only keep a small portion of the dump in memory.

Example of importing a dump into YDB

As an example, data generated by pgbench will be loaded.

  1. Start Docker containers with PostgreSQL and YDB:

    docker run -d --rm -e POSTGRES_USER=root -e POSTGRES_PASSWORD=1234 \
        -e POSTGRES_DB=local --name postgres postgres:14
    docker run --name ydb-postgres -d --pull always -p 5432:5432 -p 8765:8765 \
        -e POSTGRES_USER=root -e POSTGRES_PASSWORD=1234 \
        -e YDB_FEATURE_FLAGS=enable_temp_tables \
        -e YDB_TABLE_ENABLE_PREPARED_DDL=true \
        -e YDB_USE_IN_MEMORY_PDISKS=true \
        ghcr.io/ydb-platform/ydb-local:latest
    
  2. Generate data through pgbench:

    docker exec postgres pgbench -i postgres://root:1234@localhost/local
    
  3. Create a dump of the database using pg_dump and load it into YDB:

    docker exec postgres pg_dump -U root --format=c --file=/var/lib/postgresql/data/dump.sql local
    docker cp postgres:/var/lib/postgresql/data/dump.sql .
    cat dump.sql | docker exec -i ydb-postgres psql -U root -d local