TPC-DS workload
The workload is based on the TPC-DS specification, with the queries and table schemas adapted for YDB.
This benchmark generates a workload typical for decision support systems.
Common command options
All commands support the common option --path, which specifies the path to the directory containing benchmark tables in the database:
ydb workload tpcds --path tpcds/s1 ...
Available options
| Name | Description | Default value |
|---|---|---|
--path or -p |
Path to the directory with tables. | / |
Initializing the load test
Before running the benchmark, create a table:
ydb workload tpcds --path tpcds/s1 init
See the command description to run the load:
ydb workload tpcds init --help
Available parameters
| Name | Description | Default value |
|---|---|---|
--store <value> |
Table storage type. Possible values: row, column, external-s3. |
column |
--external-s3-prefix <value> |
Relevant only for external tables. Root path to the dataset in S3 storage. | |
--external-s3-endpoint <value> or -e <value> |
Relevant only for external tables. Link to the S3 bucket with data. | |
--string |
Use the String type for text fields. |
Utf8 |
--datetime |
Use for time-related fields of type Date, Datetime, and Timestamp. |
Date32, Datetime64, Timestamp64 |
--partition-size |
Maximum partition size in megabytes (AUTO_PARTITIONING_PARTITION_SIZE_MB) for row tables. | 2000 |
--float-mode <value> |
Specifies the data type to use for fractional fields. Possible values are double and decimal. double uses the Double type, decimal uses Decimal with dimensions specified by the test standard. |
double |
--scale |
Sets the percentage of the benchmark's data size and workload to use, relative to full scale. | 1 |
--clear |
If the table at the specified path already exists, it will be deleted. |
Loading data into the table
The data will be generated and loaded into the table directly by YDB CLI:
ydb workload tpcds --path tpcds/s1 import generator --scale 1
See the command description:
ydb workload tpcds import --help
Available options
| Name | Description | Default value |
|---|---|---|
--scale <value> |
Data scale. Typically, powers of ten are used. Also supports fractional scale, which is not described in the TPC-DS specification. It can be useful for quickly testing small YDB databases. Examples: 0.1, 0.3. |
|
--tables <value> |
Comma-separated list of tables to generate. Available tables: customer, nation, order_line, part_psupp, region, supplier. |
All tables |
--process-count <value> or -C <value> |
Specifies the number of processes for parallel data generation. | 1 |
--process-index <value> or -i <value> |
Specifies the process number when data generation is split into multiple processes. | 0 |
--state <path> |
Path to the state file for resuming generation. If the generation is interrupted, it will resume from the same point when restarted. | |
--clear-state |
Relevant if the --state parameter is specified. Clears the state file and restarts the download from the beginning. |
|
--dry-run |
Do not execute loading queries, but only display their text. |
Common parameters of the import command
| Name | Description | Default value |
|---|---|---|
--upload-threads <value> or -t <value> |
The number of execution threads for data preparation. | The number of available cores on the client. |
--bulk-size <value> |
The size of the chunk for sending data, in rows. | 10000 |
--max-in-flight <value> |
The maximum number of data chunks that can be processed simultaneously. | 128 |
--file-output-path <value> or -f <path> |
If this option is set, the data will not be loaded into the database, but will be saved to the directory |
Run the load test
Run the load:
ydb workload tpcds --path tpcds/s1 run
During the benchmark, load statistics are displayed for each request.
See the command description:
ydb workload tpcds run --help
Common parameters for all load types
| Name | Description | Default value |
|---|---|---|
--dry-run |
Do not execute initialization queries, but only display their text. | |
--check-canonical or -c |
Use special version of queries (they have deterministic answers) and compare results with canonical ones. | |
--output <value> |
The name of the file where the query execution results will be saved. | results.out |
--iterations <value> |
The number of times each load query will be executed. | 1 |
--json <name> |
The name of the file where query execution statistics will be saved in json format. |
Not saved by default |
--ministat <name> |
The name of the file where query execution statistics will be saved in ministat format. |
Not saved by default |
--csv <name> |
The name of the file to save the CSV version of the result table. | Not saved by default |
--plan <name> |
The name of the file to save the query plan. Files like <name>.<query number>.explain and <name>.<query number>.<iteration number> will be saved in formats: ast, json, svg, and table. |
Not saved by default |
--query-prefix <setting> |
Query prefix. Every prefix is a line that will be added to the beginning of each query. For multiple prefix lines use this option several times. | Not specified by default |
--retries |
Max retry count for every request. | 0 |
--include |
Names, numbers or ranges of query numbers to be executed as part of the load. Specified as a comma-separated list, e.g.: 1,2,4-6. |
All queries executed |
--exclude |
Names, numbers or ranges of query numbers to be excluded from the load. Specified as a comma-separated list, e.g.: 1,2,4-6. |
None excluded by default |
--verbose or -v |
Print additional information to the screen during query execution. | |
--global-timeout <value> |
Global timeout for all queries. Supports time units (e.g., '5s', '1m'). Plain number interpreted as milliseconds. | Not specified by default. The time is unlimited. |
--request-timeout <value> |
Timeout for each iteration of each query. Supports time units (e.g., '5s', '1m'). Plain number interpreted as milliseconds. | Not specified by default. The time is unlimited. |
--threads <value> or -t <value> |
The number of parallel threads generating the load. Zero means that queries will be executed in the main thread; otherwise, queries will be mixed. | 0 |
TPC-DS-specific options
| Name | Description | Default value |
|---|---|---|
--syntax <value> |
Syntax of the queries to use. Available values: yql, pg (abbreviation of PostgreSQL). For more information about working with YQL syntax, see here, and for PostgreSQL here. |
yql |
--float-mode <value> |
Float mode. Can be float, decimal or decimal_ydb. If the value is float - float will be used, decimal means that decimal with canonical size specified in the TPC-DS specification (Decimal(12, 2)) will be used, and decimal_ydb means that all float will be converted to Decimal(22, 9). For more information about the Decimal type, see documentation. |
float |
--scale <value> |
Scale factor. See the TPC-DS specification, chapter 3. Used in TPC-DS queries. Also supports fractional scale, which is not described in the TPC-DS specification. It can be useful for quickly testing small YDB databases. Examples: 0.1, 0.3. For scale factors 1, 10, 100, 1000 canonical answers are specified (see the --check-canonical option description). |
1 |
Test data cleanup
Run cleanup:
ydb workload tpcds --path tpcds/s1 clean
The command has no parameters.
Was the article helpful?
Previous
Next