User workload

Allows you to perform load testing according to a specified user scenario.

A test scenario is a directory called suite, which contains a set of subdirectories: init, import, and run, corresponding to the ydb workload query commands.

The init directory can have an arbitrary structure and contains SQL queries for creating and configuring objects in the database, such as tables or indexes.
The import directory contains data to be loaded into tables. Each table corresponds to a subdirectory in the import directory.
The run directory can have an arbitrary structure and contains SQL queries for load testing.

Example of a suite directory on GitHub.

The user specifies the path to the suite directory using the --suite-path option in each command.

General syntax

General syntax for calling user workload commands:

ydb [global options...] workload query [--path <path/in/db>] <command> [options...]

global options — global parameters.
--path — the path in the database for load testing objects.
<command> — one of the commands: init, import, run, clean.
options — parameters for a specific command.

Common command parameters

All commands support the common --path parameter, which sets the path in the database for load testing objects. All created objects will be created at this path. The path will also be taken into account during test execution and cleanup.

Available parameters

Name	Description	Default value
`--path` or `-p`	Path in the database for load testing objects.	`/`

Load test initialization

Before starting the test, create tables and other database objects:

ydb workload query --path user_path init --suite-path ~/user-suite

The init command works as follows:

If a suite is specified and it contains an init directory, the init directory is scanned, including its nested directories, using a depth-first search with lexicographical ordering of files and subdirectories at each level.
Files with .sql and .yql extensions are selected.
The content of each file is read and executed sequentially as an SQL query.
After that, queries specified directly from the command line using the --query parameter are executed.
If any of the queries result in an error, the command execution stops.

It is assumed that relatively "lightweight" queries, such as creating tables and indexes, are performed at this stage. It is not recommended to include scripts for filling tables with data in the files in the init directory. These operations should be moved to the import phase.

Macros can be used in queries:

{db} — The absolute path in the database to the testing directory. It is a combination of the --database and --path option values.

View the command description for initializing tables:

ydb workload query init --help

Available parameters

Name	Description	Default value
`--suite-path <path>`	Path to the `suite` directory.
`--query <query>` or `-q <query>`	DDL query to execute. Can be used multiple times.
`--clear`	If a table being created already exists at the specified path, it will be deleted first.
`--dry-run`	Do not execute initialization queries, only print them.

Loading data into a table

Load data into tables:

ydb workload query --path user_path import --suite-path ~/user-suite

The import command works as follows:

If a suite is specified and it contains an import directory, it performs a sequential traversal of its subdirectories.
It is assumed that each subdirectory corresponds to a table in the database.
Files in supported data formats are used in the subdirectory, namely csv, tsv, csv.gz, tsv.gz, and parquet.
Data is loaded from files into the corresponding tables. Loading is multi-threaded, with both the loading of different files and parts of a single file being parallelized. The order of loading is not guaranteed.

The loading process can take a long time and may be interrupted for some reason. To be able to continue loading from the point of interruption, a state mechanism can be applied. Using the --state option, you can specify the path to a state file. If the state file exists, it will be used to continue loading. If there is no state file, it will be created during the loading process. It can also be cleared using the --clear-state option.

View the command description for loading data:

ydb workload query import --help

Available parameters

Name	Description	Default value
`--suite-path <path>`	Path to the `suite` directory.
`--state <path>`	Path to the loading state file. If the loading was interrupted for some reason, it will be continued from the same place upon a new start.
`--clear-state`	Relevant if the `--state` parameter is specified. Clear the state file and start loading from the beginning.
`--dry-run`	Do not load data into the database, only prepare the data and output a message about it.

Common parameters of the import command

Name	Description	Default value
`--upload-threads <value>` or `-t <value>`	The number of execution threads for data preparation.	The number of available cores on the client.
`--bulk-size <value>`	The size of the chunk for sending data, in rows.	10000
`--max-in-flight <value>`	The maximum number of data chunks that can be processed simultaneously.	128
`--file-output-path <value>` or `-f <path>`	If this option is set, the data will not be loaded into the database, but will be saved to the directory .

Running a load test

Start the load:

ydb workload query --path user_path run --suite-path ~/user-suite

The run command works as follows:

If the suite contains a run directory, the command traverses it and all its nested directories using a depth-first search, ordering files and subdirectories alphabetically at each level. If the run directory is missing, execution stops.
It selects files with .sql and .yql extensions and includes them in the list of SQL queries.
After that, it adds queries specified directly from the command line using the --query parameter to the list.
Depending on the options, it either executes all queries sequentially, each query a specified number of times, or in parallel in multiple threads with query shuffling.
Errors in queries will be recorded in the statistics but will not cause the testing to stop.

During the test, load statistics for each query and aggregated statistics for all queries are displayed on the screen.

View the command description for starting the load:

ydb workload query run --help

Common parameters for all load types

Name	Description	Default value
`--dry-run`	Do not execute initialization queries, but only display their text.
`--check-canonical` or `-c`	Use special version of queries (they have deterministic answers) and compare results with canonical ones.
`--output <value>`	The name of the file where the query execution results will be saved.	`results.out`
`--iterations <value>`	The number of times each load query will be executed.	`1`
`--json <name>`	The name of the file where query execution statistics will be saved in `json` format.	Not saved by default
`--ministat <name>`	The name of the file where query execution statistics will be saved in `ministat` format.	Not saved by default
`--csv <name>`	The name of the file to save the CSV version of the result table.	Not saved by default
`--plan <name>`	The name of the file to save the query plan. Files like `<name>.<query number>.explain` and `<name>.<query number>.<iteration number>` will be saved in formats: `ast`, `json`, `svg`, and `table`.	Not saved by default
`--query-prefix <setting>`	Query prefix. Every prefix is a line that will be added to the beginning of each query. For multiple prefix lines use this option several times.	Not specified by default
`--retries`	Max retry count for every request.	`0`
`--include`	Names, numbers or ranges of query numbers to be executed as part of the load. Specified as a comma-separated list, e.g.: `1,2,4-6`.	All queries executed
`--exclude`	Names, numbers or ranges of query numbers to be excluded from the load. Specified as a comma-separated list, e.g.: `1,2,4-6`.	None excluded by default
`--verbose` or `-v`	Print additional information to the screen during query execution.
`--global-timeout <value>`	Global timeout for all queries. Supports time units (e.g., '5s', '1m'). Plain number interpreted as milliseconds.	Not specified by default. The time is unlimited.
`--request-timeout <value>`	Timeout for each iteration of each query. Supports time units (e.g., '5s', '1m'). Plain number interpreted as milliseconds.	Not specified by default. The time is unlimited.
`--threads <value>` or `-t <value>`	The number of parallel threads generating the load. Zero means that queries will be executed in the main thread; otherwise, queries will be mixed.	`0`
`--stats <value>`	Extended execution statistics collection mode. Available values: `full`, `profile`.	`full`

Query-specific options

Name	Description	Default value
`--suite-path <path>`	Path to the `suite` directory.
`--query <query>` or `-q <query>`	Query to execute. Can be used multiple times.

Canonical results

A canonical (expected) result can be specified for each query from a file. If the query response does not match the canonical one, this will be reflected in the statistics and command output. Differences will also be provided.

The canonical result is specified using a file with the same name and an additional .result extension. These are CSV files with headers and some additional syntax:

If a query has more than one result set, the result file must contain the corresponding number of datasets separated by empty lines.

For example, the query
```
SELECT COUNT(*) AS count FROM table_1;
SELECT MIN(col1) AS min, MAX(col1) AS max FROM table_2;
```
Might have the following result file:
```
count
10

min,max
4,5
```
The last line can be specified as ..., which means that the query result can have more lines than in the canonical result, but the first lines of the result must match the canonical ones.
By default, floating-point numbers are compared with a relative precision of 1e-3 percent, but in the canonical result, you can specify any absolute or relative precision for each value, for example: 1.5+-0.01, 2.4e+10+-1%.

For queries specified with the --query option, a canonical result cannot be set.

The canonical result will not be used unless the --check-canonical flag is set.

Cleaning up test data

Start the cleanup:

ydb workload query --path user_path clean

Performs recursive deletion of database objects at the path specified in the --path parameter.

The command has no parameters.

Example of running a user test

Download an example from the repository

git clone https://github.com/ydb-platform/ydb.git
cd ydb/ydb/tests/functional/tpc/data/e1

Start initialization

ydb workload query --path test_path init --suite-path .

Result:

Init tables ...
Init tables ...Ok

Start data import

ydb workload query --path test_path import --suite-path .

Result:

Fill table test_table_1...
Fill table test_table_1 OK 0 / 0 (0.353816s)
Fill table test_table_2...

Fill table test_table_2 OK 0 / 0 (0.102385s)

Start the load test

ydb workload query --path user_path run --suite-path e1

Result:

first_query_set.1.sql:
    iteration 0:    ok    0.131573s seconds
first_query_set.2.yql:
    iteration 0:    ok    0.089327s seconds
second_query_set.join.sql:
    iteration 0:    ok    0.145536s seconds

Results for 1 iterations
┌───────────────────────────┬──────────┬─────────┬─────────┬─────────┬─────────┬───────────┬─────────┬─────────┬─────────┬─────────┬────────────────┬────────────────┬────────────────┬───────────┬──────────────┬────────────┬────────────┐
│ Query #                   │ ColdTime │ Min     │ Max     │ Mean    │ Median  │ UnixBench │ Std     │ RttMin  │ RttMax  │ RttAvg  │ CompilationMin │ CompilationMax │ CompilationAvg │ GrossTime │ SuccessCount │ FailsCount │ DiffsCount │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ first_query_set.1.sql     │   0.086  │   0.086 │   0.086 │   0.086 │   0.086 │   0.086   │   0.000 │   0.045 │   0.045 │   0.045 │   0.041        │   0.041        │   0.041        │   0.132   │ 1            │            │            │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ first_query_set.2.yql     │   0.081  │   0.081 │   0.081 │   0.081 │   0.080 │   0.081   │   0.001 │   0.008 │   0.008 │   0.008 │   0.039        │   0.039        │   0.039        │   0.089   │ 1            │            │            │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ second_query_set.join.sql │   0.131  │   0.131 │   0.131 │   0.131 │   0.131 │   0.131   │   0.000 │   0.014 │   0.014 │   0.014 │   0.082        │   0.082        │   0.082        │   0.146   │ 1            │            │            │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ Sum                       │   0.299  │   0.299 │   0.299 │   0.299 │   0.297 │   0.299   │   0.000 │   0.068 │   0.068 │   0.068 │   0.161        │   0.161        │   0.162        │   0.367   │ 3            │            │            │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ Avg                       │   0.100  │   0.100 │   0.100 │   0.100 │   0.099 │   0.100   │   0.000 │   0.023 │   0.023 │   0.023 │   0.054        │   0.054        │   0.054        │   0.122   │ 3            │            │            │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ GAvg                      │   0.097  │   0.097 │   0.097 │   0.097 │   0.097 │   0.097   │   0.000 │   0.018 │   0.018 │   0.001 │   0.051        │   0.051        │   0.001        │   0.000   │ 3            │            │            │
└───────────────────────────┴──────────┴─────────┴─────────┴─────────┴─────────┴───────────┴─────────┴─────────┴─────────┴─────────┴────────────────┴────────────────┴────────────────┴───────────┴──────────────┴────────────┴────────────┘

Results saved to results.out

Clean up tables

ydb workload query --path user_path clean

Was the article helpful?

TPC-DS load

Fulltext load