TPC-DS workload
The workload is based on the TPC-DS documentation, with the queries and table schemas adapted for YDB.
This benchmark generates a workload typical for decision support systems.
Common command options
All commands support the common option --path
, which specifies the path to the directory containing benchmark tables in the database:
ydb workload tpcds --path tpcds/s1 ...
Available options
Name | Description | Default value |
---|---|---|
--path or -p |
Path to the directory with tables. | / |
Initializing the load test
Before running the benchmark, create a table:
ydb workload tpcds --path tpcds/s1 init
See the command description to run the load:
ydb workload tpcds init --help
Available parameters
Name | Description | Default value |
---|---|---|
--store <value> |
Table storage type. Possible values: row , column , external-s3 . |
row |
--external-s3-prefix <value> |
Relevant only for external tables. Root path to the dataset in S3 storage. | |
--external-s3-endpoint <value> or -e <value> |
Relevant only for external tables. Link to the S3 bucket with data. | |
--string |
Use the String type for text fields. |
Utf8 |
--datetime |
Use for time-related fields of type Date , Datetime , and Timestamp . |
Date32 , Datetime64 , Timestamp64 |
--float-mode <value> |
Specifies the data type to use for fractional fields. Possible values are float , decimal , and decimal_ydb . float uses the Float type, decimal uses Decimal with dimensions specified by the test standard, and decimal_ydb uses Decimal(22,9) — the only type currently supported by YDB. |
float |
--clear |
If the table at the specified path already exists, it will be deleted. |
Loading data into the table
The data will be generated and loaded into the table directly by YDB CLI:
ydb workload tpcds --path tpcds/s1 import generator --scale 1
See the command description:
ydb workload tpcds import --help
Available options
Name | Description | Default value |
---|---|---|
--scale <value> |
Data scale. Typically, powers of ten are used. | |
--tables <value> |
Comma-separated list of tables to generate. Available tables: customer , nation , order_line , part_psupp , region , supplier . |
All tables |
--process-count <value> or -C <value> |
Specifies the number of processes for parallel data generation. | 1 |
--process-index <value> or -i <value> |
Specifies the process number when data generation is split into multiple processes. | 0 |
--state <path> |
Path to the state file for resuming generation. If the generation is interrupted, it will resume from the same point when restarted. | |
--clear-state |
Relevant if the --state parameter is specified. Clears the state file and restarts the download from the beginning. |
Common parameters of the import command
Name | Description | Default value |
---|---|---|
--upload-threads <value> or -t <value> |
The number of execution threads for data preparation. | The number of available cores on the client. |
--bulk-size <value> |
The size of the chunk for sending data, in rows. | 10000 |
--max-in-flight <value> |
The maximum number of data chunks that can be processed simultaneously. | 128 |
Run the load test
Run the load:
ydb workload tpcds --path tpcds/s1 run
During the benchmark, load statistics are displayed for each request.
See the command description:
ydb workload tpcds run --help
Common parameters for all load types
Name | Description | Default value |
---|---|---|
--output <value> |
The name of the file where the query execution results will be saved. | results.out |
--iterations <value> |
The number of times each load query will be executed. | 1 |
--json <name> |
The name of the file where query execution statistics will be saved in json format. |
Not saved by default |
--ministat <name> |
The name of the file where query execution statistics will be saved in ministat format. |
Not saved by default |
--plan <name> |
The name of the file to save the query plan. Files like <name>.<query number>.explain and <name>.<query number>.<iteration number> will be saved in formats: ast , json , svg . |
Not saved by default |
--query-settings <setting> |
Query execution settings. Each setting is added as a separate line at the beginning of each query. Use multiple times for multiple settings. | Not specified by default |
--include |
Query numbers or segments to be executed as part of the load. | All queries executed |
--exclude |
Query numbers or segments to be excluded from the load. | None excluded by default |
--executer |
Query execution engine. Available values: scan , generic . |
generic |
--verbose or -v |
Print additional information to the screen during query execution. |
TPC-DS-specific options
Name | Description | Default value |
---|---|---|
--ext-query-dir <name> |
Directory with external queries for load execution. Queries should be in files named q[1-99].sql . |
Test data cleanup
Run cleanup:
ydb workload tpcds --path tpcds/s1 clean
The command has no parameters.