TPC-H workload

The workload is based on the TPC-H documentation, and the queries and table schema are adapted for YDB.

The test generates a typical decision support workload.

Initializing a load test

Before running the benchmark, create a table:

ydb workload tpch init

See the description of the command to run the data load:

ydb workload tpch init --help

Available parameters

Option name Option description
--path <value> Directory to create tables in. Default value - empty string.
--store <value> Type of table storage. Acceptable values: row, column. Default value: row.

Uploading data to the table

You can download the dataset generator for the TPC-H benchmark by the link.
Then follow the instructions from README.
In the dss.h file, you can specify a field separator: Default: #define SEPARATOR '|'.
In the sample data upload script, '\t' is used as a separator.

for table in region nation supplier customer part partsupp orders lineitem; do
    echo "Start data load to $table"
    ydb import file tsv --header --path "$table" --input-file $table.tsv --newline-delimited
    echo "Finish data load to $table"
done

Running a load test

Run the load:

ydb workload tpch run

During this test, workload statistics for each query are displayed on the screen.

See the description of the command to run the data load:

ydb workload tpch run --help

Global parameters for all types of load

Option name Option description
--path <value> Directory to create tables in. Default value - empty string.
--output <value> The name of the file in which the query execution results will be saved. The default value is results.out.
--iterations <value> The number of executions of each load generating query. The default value is 1.
--json The name of the file in which the query execution statistics will be saved in json format. By default, the file is not saved.
--ministat The name of the file in which the query execution statistics will be saved in ministat format. By default, the file is not saved.
--query-settings Query execution settings. By default, not specified.
--ext-queries-dir Name of the directory with external queries used to apply the workload.
--include The numbers or number sections of the queries to be executed as part of the load. By default, all queries are executed. Separated by commas, for example, 1,2,4-6.
--exclude The numbers or number sections of the queries to be excluded as part of the load. By default, all queries are executed. Separated by commas, for example, 1,2,4-6.