Dump data form PostgreSQL
Data from PostgreSQL can be migrated to YDB using utilities such as pg_dump, psql, and YDB CLI. The pg_dump and psql utilities are installed with PostgreSQL. YDB CLI is YDB's command-line client, which is installed separately.
To do this, you need to:
-
Create a data dump using pg_dump with the following parameters:
-
Convert the dump to a format supported by YDB using the
ydb tools pg-convert
command from YDB CLI. -
Load the result into YDB in PostgreSQL compatibility mode.
pg-convert command
The ydb tools pg-convert
command reads a dump file or standard input created by the pg_dump utility, performs transformations, and outputs to standard output a dump that can be sent to YDB's PostgreSQL-compatible middleware.
ydb tools pg-convert
performs the following transformations:
-
Moving the creation of the primary key into the body of the CREATE TABLE command.
-
Removing the
public
schema from table names. -
Deleting the
WITH (...)
section inCREATE TABLE
. -
Commenting out unsupported constructs (optionally):
SELECT pg_catalog.set_config.*
ALTER TABLE
If the CLI cannot find a table's primary key, it will automatically create a BIGSERIAL column named __ydb_stub_id
as the primary key.
The general form of the command:
ydb [global options...] tools pg-convert [options...]
global options
— global parameters.options
— subcommand parameters.
subcommand parameters
Name | Description |
---|---|
-i |
The name of the file containing the original dump. If the option is not specified, the dump is read from standard input. |
--ignore-unsupported |
When this option is specified, unsupported constructs will be commented out in the resulting dump and duplicated in standard error. By default, if unsupported constructs are detected, the command returns an error. This does not apply to ALTER TABLE expressions that define a table's primary key, as they are commented out in any case. |
Warning
When loading large dumps, reading from standard input is not recommended because the entire dump will be stored in RAM. It is advised to use the file option, in which case the CLI will only keep a small portion of the dump in memory.
Example of importing a dump into YDB
As an example, data generated by pgbench will be loaded.
- Start Docker containers with PostgreSQL and YDB:
docker run -d --rm -e POSTGRES_USER=root -e POSTGRES_PASSWORD=1234 \
-e POSTGRES_DB=local --name postgres postgres:14
docker run --name ydb-postgres -d --pull always -p 5432:5432 -p 8765:8765 \
-e POSTGRES_USER=root -e POSTGRES_PASSWORD=1234 \
-e YDB_FEATURE_FLAGS=enable_temp_tables \
-e YDB_USE_IN_MEMORY_PDISKS=true \
ghcr.io/ydb-platform/ydb-local:latest
- Generate data through pgbench:
docker exec postgres pgbench -i postgres://root:1234@localhost/local
- Create a dump of the database using pg_dump and load it into YDB:
docker exec postgres pg_dump -U root --format=c --file=/var/lib/postgresql/data/dump.sql local
docker cp postgres:/var/lib/postgresql/data/dump.sql .
cat dump.sql | docker exec -i ydb-postgres psql -U root -d local