Importing data from an S3 compatible storage

The import s3 command starts, on the server side, the process of importing data and schema object details from an S3-compatible storage, in the format described in the File structure section:

ydb [connection options] import s3 [options]

where [connection options] are database connection options

As opposed to the tools restore command, the import s3 command always creates objects in entirety, so none of the imported objects (directories or tables) should already exist.

If you need to import some more data to your existing S3 tables (for example, using S3cmd), you can copy the S3 contents to the file system and use the tools restore command.

Command line parameters

[options]: Command parameters:

S3 connection parameters

To run the command to import data from an S3 storage, specify the S3 connection parameters. As data is imported by the YDB server asynchronously, the specified endpoint must be available so that a connection can be established from the server side.

List of imported objects

--item STRING: Description of the item to import. You can specify the --item parameter multiple times if you need to import multiple items. STRING is set in <property>=<value>,... format with the following mandatory properties:

source, src or s is the path (key prefix) in S3 that hosts the imported directory or table
destination, dst, or d is the database path to host the imported directory or table. The destination of the path must not exist. All the directories along the path will be created if missing.

Additional parameters

--description STRING: A text description of the operation saved in the operation history
--retries NUM: The number of import retries to be made by the server. The default value is 10.
--format STRING: The format of the results.

pretty: Human-readable format (default).
proto-json-base64: Protobuf in JSON format, binary strings are Base64-encoded.

Importing

Export result

If successful, the import s3 command prints summary information about the enqueued operation to import data from S3 in the format specified in the --format option. The import itself is performed by the server asynchronously. The summary shows the operation ID that you can use later to check the operation status and perform actions on it:

In the default pretty mode, the operation ID is displayed in the id field with semigraphics formatting:

┌───────────────────────────────────────────┬───────┬─────...
| id                                        | ready | stat...
├───────────────────────────────────────────┼───────┼─────...
| ydb://import/8?id=281474976788395&kind=s3 | true  | SUCC...
├╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴┴╴╴╴╴╴...
| Items:
...

In the proto-json-base64 mode, the operation ID is in the "id" attribute:
```
{"id":"ydb://export/8?id=281474976788395&kind=s3","ready":true, ... }
```

Import status

Data is imported in the background. To get information on import status, use the operation get command with the operation ID enclosed in quotation marks and passed as a command parameter. For example:

ydb -p quickstart operation get "ydb://import/8?id=281474976788395&kind=s3"

The operation get format is also set by the --format option.

Although the operation ID is in URL format, there is no guarantee that it is maintained in the future. It should only be interpreted as a string.

You can track the import by changes in the "progress" attribute:

In the default pretty mode, successfully completed export operations are displayed as "Done" in the progress field with semigraphics formatting:

┌───── ... ──┬───────┬─────────┬──────────┬─...
| id         | ready | status  | progress | ...
├──────... ──┼───────┼─────────┼──────────┼─...
| ydb:/...   | true  | SUCCESS | Done     | ...
├╴╴╴╴╴ ... ╴╴┴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴╴╴╴┴╴...
...

In the proto-json-base64 mode, the completed export operation is indicated with the PROGRESS_DONE value of the progress attribute:
```
{"id":"ydb://...", ...,"progress":"PROGRESS_DONE",... }
```

Completing the import operation

When the import is complete, use operation forget to delete the import from the operation list:

ydb -p quickstart operation forget "ydb://import/8?id=281474976788395&kind=s3"

List of import operations

To get a list of import operations, run the operation list import/s3 command:

ydb -p quickstart operation list import/s3

The operation list format is also set by the --format option.

Examples

Note

The examples use the quickstart profile. To learn more, see Creating a profile to connect to a test database.

Importing to the database root

Importing to the database root the contents of the export1 directory in the mybucket bucket using the S3 authentication parameters taken from the environment variables or the ~/.aws/credentials file:

ydb -p quickstart import s3 \
  --s3-endpoint storage.yandexcloud.net --bucket mybucket \
  --item src=export1,dst=.

Importing multiple directories

Importing items from the dir1 and dir2 directories in the mybucket S3 bucket to the same-name database directories using explicitly specified S3 authentication parameters:

ydb -p quickstart import s3 \
  --s3-endpoint storage.yandexcloud.net --bucket mybucket \
  --access-key VJGSOScgs-5kDGeo2hO9 --secret-key fZ_VB1Wi5-fdKSqH6074a7w0J4X0 \
  --item src=export/dir1,dst=dir1 --item src=export/dir2,dst=dir2

Getting operation IDs

To get a list of import operation IDs in a bash-friendly format, use the jq utility:

ydb -p quickstart operation list import/s3 --format proto-json-base64 | jq -r ".operations[].id"

You'll get a result where each new line shows an operation's ID. For example:

ydb://import/8?id=281474976789577&kind=s3
ydb://import/8?id=281474976789526&kind=s3
ydb://import/8?id=281474976788779&kind=s3

You can use these IDs, for example, to run a loop to end all the current operations:

ydb -p quickstart operation list import/s3 --format proto-json-base64 | jq -r ".operations[].id" | while read line; do ydb -p quickstart operation forget $line;done