User workload
Allows you to perform load testing according to a specified user scenario.
A test scenario is a directory called suite, which contains a set of subdirectories: init, import, and run, corresponding to the ydb workload query commands.
- The
initdirectory can have an arbitrary structure and contains SQL queries for creating and configuring objects in the database, such as tables or indexes. - The
importdirectory contains data to be loaded into tables. Each table corresponds to a subdirectory in theimportdirectory. - The
rundirectory can have an arbitrary structure and contains SQL queries for load testing.
Example of a suite directory on GitHub.
The user specifies the path to the suite directory using the --suite-path option in each command.
General syntax
General syntax for calling user workload commands:
ydb [global options...] workload query [--path <path/in/db>] <command> [options...]
global options— global parameters.--path— the path in the database for load testing objects.<command>— one of the commands:init,import,run,clean.options— parameters for a specific command.
Common command parameters
All commands support the common --path parameter, which sets the path in the database for load testing objects. All created objects will be created at this path. The path will also be taken into account during test execution and cleanup.
Available parameters
| Name | Description | Default value |
|---|---|---|
--path or -p |
Path in the database for load testing objects. | / |
Load test initialization
Before starting the test, create tables and other database objects:
ydb workload query --path user_path init --suite-path ~/user-suite
The init command works as follows:
- If a
suiteis specified and it contains aninitdirectory, theinitdirectory is scanned, including its nested directories, using a depth-first search with lexicographical ordering of files and subdirectories at each level. - Files with
.sqland.yqlextensions are selected. - The content of each file is read and executed sequentially as an SQL query.
- After that, queries specified directly from the command line using the
--queryparameter are executed. - If any of the queries result in an error, the command execution stops.
It is assumed that relatively "lightweight" queries, such as creating tables and indexes, are performed at this stage. It is not recommended to include scripts for filling tables with data in the files in the init directory. These operations should be moved to the import phase.
Macros can be used in queries:
{db}— The absolute path in the database to the testing directory. It is a combination of the--databaseand--pathoption values.
View the command description for initializing tables:
ydb workload query init --help
Available parameters
| Name | Description | Default value |
|---|---|---|
--suite-path <path> |
Path to the suite directory. |
|
--query <query> or -q <query> |
DDL query to execute. Can be used multiple times. | |
--clear |
If a table being created already exists at the specified path, it will be deleted first. | |
--dry-run |
Do not execute initialization queries, only print them. |
Loading data into a table
Load data into tables:
ydb workload query --path user_path import --suite-path ~/user-suite
The import command works as follows:
- If a
suiteis specified and it contains animportdirectory, it performs a sequential traversal of its subdirectories. - It is assumed that each subdirectory corresponds to a table in the database.
- Files in supported data formats are used in the subdirectory, namely
csv,tsv,csv.gz,tsv.gz, andparquet. - Data is loaded from files into the corresponding tables. Loading is multi-threaded, with both the loading of different files and parts of a single file being parallelized. The order of loading is not guaranteed.
The loading process can take a long time and may be interrupted for some reason. To be able to continue loading from the point of interruption, a state mechanism can be applied. Using the --state option, you can specify the path to a state file. If the state file exists, it will be used to continue loading. If there is no state file, it will be created during the loading process. It can also be cleared using the --clear-state option.
View the command description for loading data:
ydb workload query import --help
Available parameters
| Name | Description | Default value |
|---|---|---|
--suite-path <path> |
Path to the suite directory. |
|
--state <path> |
Path to the loading state file. If the loading was interrupted for some reason, it will be continued from the same place upon a new start. | |
--clear-state |
Relevant if the --state parameter is specified. Clear the state file and start loading from the beginning. |
|
--dry-run |
Do not load data into the database, only prepare the data and output a message about it. |
Common parameters of the import command
| Name | Description | Default value |
|---|---|---|
--upload-threads <value> or -t <value> |
The number of execution threads for data preparation. | The number of available cores on the client. |
--bulk-size <value> |
The size of the chunk for sending data, in rows. | 10000 |
--max-in-flight <value> |
The maximum number of data chunks that can be processed simultaneously. | 128 |
--file-output-path <value> or -f <path> |
If this option is set, the data will not be loaded into the database, but will be saved to the directory |
Running a load test
Start the load:
ydb workload query --path user_path run --suite-path ~/user-suite
The run command works as follows:
- If the
suitecontains arundirectory, the command traverses it and all its nested directories using a depth-first search, ordering files and subdirectories alphabetically at each level. If therundirectory is missing, execution stops. - It selects files with
.sqland.yqlextensions and includes them in the list of SQL queries. - After that, it adds queries specified directly from the command line using the
--queryparameter to the list. - Depending on the options, it either executes all queries sequentially, each query a specified number of times, or in parallel in multiple threads with query shuffling.
- Errors in queries will be recorded in the statistics but will not cause the testing to stop.
During the test, load statistics for each query and aggregated statistics for all queries are displayed on the screen.
View the command description for starting the load:
ydb workload query run --help
Common parameters for all load types
| Name | Description | Default value |
|---|---|---|
--dry-run |
Do not execute initialization queries, but only display their text. | |
--check-canonical or -c |
Use special version of queries (they have deterministic answers) and compare results with canonical ones. | |
--output <value> |
The name of the file where the query execution results will be saved. | results.out |
--iterations <value> |
The number of times each load query will be executed. | 1 |
--json <name> |
The name of the file where query execution statistics will be saved in json format. |
Not saved by default |
--ministat <name> |
The name of the file where query execution statistics will be saved in ministat format. |
Not saved by default |
--csv <name> |
The name of the file to save the CSV version of the result table. | Not saved by default |
--plan <name> |
The name of the file to save the query plan. Files like <name>.<query number>.explain and <name>.<query number>.<iteration number> will be saved in formats: ast, json, svg, and table. |
Not saved by default |
--query-prefix <setting> |
Query prefix. Every prefix is a line that will be added to the beginning of each query. For multiple prefix lines use this option several times. | Not specified by default |
--retries |
Max retry count for every request. | 0 |
--include |
Names, numbers or ranges of query numbers to be executed as part of the load. Specified as a comma-separated list, e.g.: 1,2,4-6. |
All queries executed |
--exclude |
Names, numbers or ranges of query numbers to be excluded from the load. Specified as a comma-separated list, e.g.: 1,2,4-6. |
None excluded by default |
--verbose or -v |
Print additional information to the screen during query execution. | |
--global-timeout <value> |
Global timeout for all queries. Supports time units (e.g., '5s', '1m'). Plain number interpreted as milliseconds. | Not specified by default. The time is unlimited. |
--request-timeout <value> |
Timeout for each iteration of each query. Supports time units (e.g., '5s', '1m'). Plain number interpreted as milliseconds. | Not specified by default. The time is unlimited. |
--threads <value> or -t <value> |
The number of parallel threads generating the load. Zero means that queries will be executed in the main thread; otherwise, queries will be mixed. | 0 |
--stats <value> |
Extended execution statistics collection mode. Available values: full, profile. |
full |
Query-specific options
| Name | Description | Default value |
|---|---|---|
--suite-path <path> |
Path to the suite directory. |
|
--query <query> or -q <query> |
Query to execute. Can be used multiple times. |
Canonical results
A canonical (expected) result can be specified for each query from a file. If the query response does not match the canonical one, this will be reflected in the statistics and command output. Differences will also be provided.
The canonical result is specified using a file with the same name and an additional .result extension. These are CSV files with headers and some additional syntax:
-
If a query has more than one result set, the result file must contain the corresponding number of datasets separated by empty lines.
For example, the query
SELECT COUNT(*) AS count FROM table_1; SELECT MIN(col1) AS min, MAX(col1) AS max FROM table_2;Might have the following result file:
count 10 min,max 4,5 -
The last line can be specified as
..., which means that the query result can have more lines than in the canonical result, but the first lines of the result must match the canonical ones. -
By default, floating-point numbers are compared with a relative precision of
1e-3percent, but in the canonical result, you can specify any absolute or relative precision for each value, for example:1.5+-0.01,2.4e+10+-1%.
For queries specified with the --query option, a canonical result cannot be set.
The canonical result will not be used unless the --check-canonical flag is set.
Cleaning up test data
Start the cleanup:
ydb workload query --path user_path clean
Performs recursive deletion of database objects at the path specified in the --path parameter.
The command has no parameters.
Example of running a user test
Download an example from the repository
git clone https://github.com/ydb-platform/ydb.git
cd ydb/ydb/tests/functional/tpc/data/e1
Start initialization
ydb workload query --path test_path init --suite-path .
Result:
Init tables ...
Init tables ...Ok
Start data import
ydb workload query --path test_path import --suite-path .
Result:
Fill table test_table_1...
Fill table test_table_1 OK 0 / 0 (0.353816s)
Fill table test_table_2...
Fill table test_table_2 OK 0 / 0 (0.102385s)
Start the load test
ydb workload query --path user_path run --suite-path e1
Result:
first_query_set.1.sql:
iteration 0: ok 0.131573s seconds
first_query_set.2.yql:
iteration 0: ok 0.089327s seconds
second_query_set.join.sql:
iteration 0: ok 0.145536s seconds
Results for 1 iterations
┌───────────────────────────┬──────────┬─────────┬─────────┬─────────┬─────────┬───────────┬─────────┬─────────┬─────────┬─────────┬────────────────┬────────────────┬────────────────┬───────────┬──────────────┬────────────┬────────────┐
│ Query # │ ColdTime │ Min │ Max │ Mean │ Median │ UnixBench │ Std │ RttMin │ RttMax │ RttAvg │ CompilationMin │ CompilationMax │ CompilationAvg │ GrossTime │ SuccessCount │ FailsCount │ DiffsCount │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ first_query_set.1.sql │ 0.086 │ 0.086 │ 0.086 │ 0.086 │ 0.086 │ 0.086 │ 0.000 │ 0.045 │ 0.045 │ 0.045 │ 0.041 │ 0.041 │ 0.041 │ 0.132 │ 1 │ │ │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ first_query_set.2.yql │ 0.081 │ 0.081 │ 0.081 │ 0.081 │ 0.080 │ 0.081 │ 0.001 │ 0.008 │ 0.008 │ 0.008 │ 0.039 │ 0.039 │ 0.039 │ 0.089 │ 1 │ │ │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ second_query_set.join.sql │ 0.131 │ 0.131 │ 0.131 │ 0.131 │ 0.131 │ 0.131 │ 0.000 │ 0.014 │ 0.014 │ 0.014 │ 0.082 │ 0.082 │ 0.082 │ 0.146 │ 1 │ │ │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ Sum │ 0.299 │ 0.299 │ 0.299 │ 0.299 │ 0.297 │ 0.299 │ 0.000 │ 0.068 │ 0.068 │ 0.068 │ 0.161 │ 0.161 │ 0.162 │ 0.367 │ 3 │ │ │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ Avg │ 0.100 │ 0.100 │ 0.100 │ 0.100 │ 0.099 │ 0.100 │ 0.000 │ 0.023 │ 0.023 │ 0.023 │ 0.054 │ 0.054 │ 0.054 │ 0.122 │ 3 │ │ │
├───────────────────────────┼──────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼─────────┼─────────┼─────────┼────────────────┼────────────────┼────────────────┼───────────┼──────────────┼────────────┼────────────┤
│ GAvg │ 0.097 │ 0.097 │ 0.097 │ 0.097 │ 0.097 │ 0.097 │ 0.000 │ 0.018 │ 0.018 │ 0.001 │ 0.051 │ 0.051 │ 0.001 │ 0.000 │ 3 │ │ │
└───────────────────────────┴──────────┴─────────┴─────────┴─────────┴─────────┴───────────┴─────────┴─────────┴─────────┴─────────┴────────────────┴────────────────┴────────────────┴───────────┴──────────────┴────────────┴────────────┘
Results saved to results.out
Clean up tables
ydb workload query --path user_path clean