Exporting data to S3-compatible storage

The export s3 command starts exporting data and information on the server side about data schema objects to S3-compatible storage, in the format described under File structure:

ydb [connection options] export s3 [options]

where [connection options] are database connection options

Command line parameters

[options]: Command parameters:

S3 connection parameters

To run the command to export data to S3 storage, specify the S3 connection parameters. Since data is exported by the YDB server asynchronously, the specified endpoint must be available to establish a connection on the server side.

List of exported items

--item STRING: Description of the item to export. You can specify the --item parameter multiple times if you need to export multiple items. STRING is set in <property>=<value>,... format with the following mandatory properties:

  • source, src, or s: Path to the exported directory or table, . indicates the DB root directory. If you specify a directory, all of its items whose names do not start with a dot and, recursively, all subdirectories whose names do not start with a dot are exported.
  • destination, dst, or d: Path (key prefix) in S3 storage to store exported items.

--exclude STRING: Template (PCRE) to exclude paths from export. Specify this parameter multiple times for different templates.

Additional parameters

Parameter Description
--description STRING Operation text description saved to the history of operations.
--retries NUM Number of export retries to be made by the server.
Defaults to 10.
--compression STRING Compress exported data.
If the default compression level is used for the Zstandard algorithm, data can be compressed by 5-10 times. Compressing data uses the CPU and may affect the speed of performing other DB operations.
Possible values:
  • zstd: Compression using the Zstandard algorithm with the default compression level (3).
  • zstd-N: Compression using the Zstandard algorithm, where N stands for the compression level (122).
--format STRING Result format.
Possible values:

Running the export command

Export result

If successful, the export s3 command outputs summary information about the enqueued operation to export data to S3, in the format specified in the --format option. The export itself is performed by the server asynchronously. The output summary shows the operation ID that you can use later to check the operation status and perform actions on it:

  • In the default pretty output mode, the operation ID is displayed in the id field with semigraphics formatting:

    ┌───────────────────────────────────────────┬───────┬─────...
    | id                                        | ready | stat...
    ├───────────────────────────────────────────┼───────┼─────...
    | ydb://export/6?id=281474976788395&kind=s3 | true  | SUCC...
    ├╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴┴╴╴╴╴╴...
    | StorageClass: NOT_SET                                      
    | Items:
    ...                                                   
    
  • In the proto-json-base64 output mode, the operation ID is in the "id" attribute:

    {"id":"ydb://export/6?id=281474976788395&kind=s3","ready":true, ... }
    

Export status

Data is exported in the background. To find out the export status and progress, use the operation get command with the operation ID enclosed in quotation marks and passed as a command parameter. For example:

ydb -p db1 operation get "ydb://export/6?id=281474976788395&kind=s3"

The operation get output format is also set by the --format option.

Although the operation ID is in URL format, there is no guarantee that it is maintained in the future. It should only be interpreted as a string.

You can track the export progress by changes in the "progress" attribute:

  • In the default pretty output mode, an export operation that completed successfully is displayed as "Done" in the progress field with semigraphics formatting:

    ┌───── ... ──┬───────┬─────────┬──────────┬─...
    | id         | ready | status  | progress | ...
    ├──────... ──┼───────┼─────────┼──────────┼─...
    | ydb:/...   | true  | SUCCESS | Done     | ...
    ├╴╴╴╴╴ ... ╴╴┴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴╴╴╴┴╴...
    ...
    
  • In the proto-json-base64 output mode, the completed export operation is indicated with the PROGRESS_DONE value of the progress attribute:

    {"id":"ydb://...", ...,"progress":"PROGRESS_DONE",... }
    

Completing the export operation

When running the export operation, a directory named export_* is created in the root directory, where * is the numeric part of the export ID. This directory stores tables with a consistent snapshot of exported data as of the export start time.

Once the export is done, use the operation forget command to make sure the export is completed: the operation is removed from the list of operations and all files created for it are deleted:

ydb -p db1 operation forget "ydb://export/6?id=281474976788395&kind=s3"

List of export operations

To get a list of export operations, run the operation list export/s3 command:

ydb -p db1 operation list export/s3

The operation list output format is also set by the --format option.

Examples

The examples use a profile named db1. For information about how to create it, see the Getting started with the YDB CLI article in the "Getting started " section.

Exporting a database

Exporting all DB objects whose names do not start with a dot and that are not stored in directories whose names start with a dot to the export1 directory in mybucket using the S3 authentication parameters from environment variables or the ~/.aws/credentials file:

ydb -p db1 export s3 \
  --s3-endpoint storage.yandexcloud.net --bucket mybucket \
  --item src=.,dst=export1

Exporting multiple directories

Exporting items from DB directories named dir1 and dir2 to the export1 directory in mybucket using the explicitly set S3 authentication parameters:

ydb -p db1 export s3 \
  --s3-endpoint storage.yandexcloud.net --bucket mybucket \
  --access-key VJGSOScgs-5kDGeo2hO9 --secret-key fZ_VB1Wi5-fdKSqH6074a7w0J4X0 \
  --item src=dir1,dst=export1/dir1 --item src=dir2,dst=export1/dir2

Getting operation IDs

To get a list of export operation IDs in a format suitable for handling in bash scripts, use the jq utility:

ydb -p db1 operation list export/s3 --format proto-json-base64 | jq -r ".operations[].id"

You'll get an output where each new line shows an operation's ID. For example:

ydb://export/6?id=281474976789577&kind=s3
ydb://export/6?id=281474976789526&kind=s3
ydb://export/6?id=281474976788779&kind=s3

You can use these IDs, for example, to run a loop to end all the current operations:

ydb -p db1 operation list export/s3 --format proto-json-base64 | jq -r ".operations[].id" | while read line; do ydb -p db1 operation forget $line;done