Importing data from the file system

Cluster

The admin cluster restore command restores a cluster from a backup on the file system. The backup must have been previously exported or prepared manually as described in the File structure of an export article:

ydb [connection options] admin cluster restore -i <PATH> [options]

where [connection options] are database connection options

The destination cluster must be running and initialized before it can be restored.

When restoring a cluster' metadata, databases and their administrators are created. Refer to Database for further details on restoring databases.

The restore operation requires that for each database to be restored, its database nodes must be available. You can start database nodes before running the restore command or while the restore operation is waiting for available nodes. If you encounter problems with available database nodes, you can restart the restore operation.

A cluster configuration is restored separately using the following steps:

  1. Load the saved configuration using the ydb admin cluster config replace command.
  2. Restart the cluster nodes.

Required parameters

-i <PATH> or --input <PATH>: Path to the directory in the client system from which the data will be imported.

Optional parameters

[options] – optional parameters of the command:

--wait-nodes-duration <DURATION>: The period of time that the restore command waits for available database nodes. Example: 10s, 5m, 1h, 1.5d, 30. Duration can be expressed in weeks, days, hours, minutes, seconds, microseconds, nanoseconds. If no suffix is specified, the duration is seconds. The duration can be fractional. Combined duration like 1h30m is not supported. If the duration is 0, the restore command does not wait for available nodes.

Database

The admin database restore command restores the database from a backup on the file system. The backup must have been previously exported with the admin database dump command or prepared manually as described in the File structure of an export article:

ydb [connection options] admin database restore -i <PATH> [options]

where [connection options] are database connection options

The restore operation requires that for each database to be restored, its database nodes must be available. You can start database nodes before running the restore command or while the restore operation is waiting for available nodes. If you encounter problems with available database nodes, you can restart the restore operation.

Restoring database schema objects follows the same process described in Schema objects.

Database configuration is restored separately using the following steps:

  1. Load the saved configuration using the ydb admin database config replace command.
  2. Restart the database nodes.

Required parameters

-i <PATH> or --input <PATH>: Path to the directory in the client system from which the data will be imported.

Optional parameters

[options] – optional parameters of the command:

--wait-nodes-duration <DURATION>: The period of time that the restore command waits for available database nodes. Example: 10s, 5m, 1h, 1.5d, 30. Duration can be expressed in weeks, days, hours, minutes, seconds, microseconds, nanoseconds. If no suffix is specified, the duration is seconds. The duration can be fractional. Combined duration like 1h30m is not supported. If the duration is 0, the restore command does not wait for available nodes.

Schema objets

The tools restore command creates the items of the database schema in the database, and populates them with the data previously exported there with the tools dump command or prepared manually as per the rules from the File structure of an export article:

ydb [connection options] tools restore -p <PATH> -i <PATH> [options]

where [connection options] are database connection options

If the table or directory already exists in the database, no changes will be made to its schema and ACL. If some columns present in the imported files are missing in the database or have mismatching types, this may lead to the data import operation failing.

To import data to the table, use the YQL REPLACE command. If the table included any records before the import, the records whose keys are present in the imported files are replaced by the data from the file. The records whose keys are absent in the imported files aren't affected.

Required parameters

  • -p <PATH> or --path <PATH>: Path to the database directory the data will be imported to. To import data to the root directory, specify .. All the missing directories along the path will be created.

  • -i <PATH> or --input <PATH>: Path to the directory in the client system from which the data will be imported.

Optional parameters

[options] – optional parameters of the command:

  • --restore-data <VAL>: Enables/disables data import, 1 (yes) or 0 (no), defaults to 1. If set to 0, the import only creates items in the schema without populating them with data. If there's no data in the file system (only the schema has been exported), it doesn't make sense to change this option.

  • --restore-indexes <VAL>: Enables/disables import of indexes, 1 (yes) or 0 (no), defaults to 1. If set to 0, the import won't either register secondary indexes in the data schema or populate them with data.

  • --restore-acl <VAL>: Enables/disables import of ACL, 1 (yes) or 0 (no), defaults to 1. If set to 0, the import creates items in the schema with an empty ACL, and their owner will be the user who started the import.

  • --dry-run: Matching the data schemas in the database and file system without updating the database, 1 (yes) or 0 (no), defaults to 0. When enabled, the system checks that:

    • All tables in the file system are present in the database
    • These items are based on the same schema, both in the file system and in the database
  • --save-partial-result: Save the partial import result. If disabled, an import error results in reverting to the database state before the import.

  • --import-data: Use ImportData, a more efficient method for uploading data than the default approach. This method sends data to the server partitioned by the client and in a lighter format. However, it returns an error when attempting to import exported data into an existing table that already has secondary indexes or is in the process of building them. To restore a table with secondary indexes, ensure they are not already present in the schema (for example, using the ydb scheme ls command). By default, ImportData is disabled.

Workload restriction parameters

Using the below parameters, you can limit the import workload against the database.

Attention!

Some of the below parameters have default values. This means that the workload will be limited even if none of them is mentioned in tools restore.

  • --rps <VAL>: Limits the number of queries used to upload batches to the database per second, the default value is 30.
  • --bandwidth <VAL>: Limit the workload per second, defaults to 0 (not set). <VAL> specifies the data amount with a unit, for example, 2MiB. If this value is set, the --rps limit (see above) is not applied.
  • --in-flight <VAL>: Limits the number of queries that can be run in parallel, the default value is 10. To achieve maximum parallelism, set the parameter value to the number of cores allocated for the restore process.
  • --upload-batch-rows <VAL>: Limits the number of records in the uploaded batch, the default value is 0 (unlimited). <VAL> determines the number of records and is set as a number with an optional unit, for example, 1K.
  • --upload-batch-bytes <VAL>: Limits the batch size of uploaded data, the default value is 512KB. <VAL> specifies the data amount with a unit, for example, 1MiB. Maximum value is 16 MiB.
  • --upload-batch-rus <VAL>: Applies only to Serverless databases to limit Request Units (RU) that can be consumed to upload one batch, defaults to 30 RU. The batch size is selected to match the specified value. <VAL> determines the number of RU and is set as a number with an optional unit, for example, 100 or 1K.

Examples

Note

The examples use the quickstart profile. To learn more, see Creating a profile to connect to a test database.

Restoring cluster

From the current file system directory:

ydb -e <endpoint> admin cluster restore -i .

From the specified file system directory:

ydb -e <endpoint> admin cluster restore -i ~/backup_cluster

Restoring database

From the current file system directory:

ydb -e <endpoint> -d <database> admin database restore -i .

From the specified file system directory:

ydb -e <endpoint> -d <database> admin database restore -i ~/backup_db

Importing schema objets to the database root

From the current file system directory:

ydb -p quickstart tools restore -p . -i .

From the current file system directory:

ydb -p quickstart tools restore -p . -i ~/backup_quickstart

Uploading data to the specified directory in the database

From the current file system directory:

ydb -p quickstart tools restore -p dir1/dir2 -i .

From the current file system directory:

ydb -p quickstart tools restore -p dir1/dir2 -i ~/backup_quickstart

Matching schemas between the database and file system:

ydb -p quickstart tools restore -p dir1/dir2 -i ~/backup_quickstart --dry-run

Example options for better performance

ydb -p quickstart tools restore -p . -i . --import-data --bandwidth=10GiB --in-flight=16 --upload-batch-bytes=16MiB