Running parameterized queries
Overview
YDB CLI can execute parameterized queries. To use parameters, you need to declare them using the YQL DECLARE
command in your query text.
The preferred way to run parameterized queries in YDB CLI is to use the ydb sql
command.
Parameter values can be set via the command-line arguments, uploaded from JSON files, and read from stdin
in binary or JSON format. Binary data can be encoded as base64 or UTF-8. While reading from stdin
or a file, you can stream multiple parameter values, triggering multiple query executions with batching options.
Why use parameterized queries?
Using parameterized queries offers several key advantages:
-
Enhanced Performance: Parameterized queries significantly boost performance when executing multiple similar queries that differ only in input parameters. This is achieved through the use of prepared statements. The query is compiled once and then cached on the server. Subsequent requests with the same query text bypass the compilation phase, allowing for immediate execution.
-
Protection Against SQL Injection: Another critical benefit of using parameterized queries is the protection they offer against SQL injection attacks. This security feature ensures that the input parameters are appropriately handled, mitigating the risk of malicious code execution.
Executing a single query
To provide parameters for a single query execution, you can use the command-line arguments, JSON files, or stdin
, using the following YDB CLI options:
Name | Description |
---|---|
-p, --param |
The value of a single query parameter in the name=value or $name=value format, where name is the parameter name and value is its value (a valid JSON value). This option can be specified multiple times.All specified parameters must be declared in the query using the DECLARE operator. Otherwise, you will receive the "Query does not contain parameter" error. If you specify the same parameter multiple times, you will receive the "Parameter value found in more than one source" error. Depending on your operating system, you might need to escape the $ character or enclose your expression in single quotes (' ). |
--input-file |
The name of a file in JSON format and UTF-8 encoding that contains parameter values matched against the query parameters by key names. Only one input file can be used. If values for the same parameter are found in multiple files or set by the --param command-line option, you will receive the "Parameter value found in more than one source" error.Keys that are present in the file but not declared in the query will be ignored without an error message. |
--input-format |
The format of parameter values applied to all sources of parameters (command line, file, or stdin ).Available options: |
--input-binary-strings |
The input binary string encoding format. Defines how binary strings in the input should be interpreted. Available options:
|
If values are specified for all non-optional (i.e., NOT NULL) parameters in the DECLARE
clause, the query will be executed on the server. If a value is absent for even one such parameter, the command fails with the error message "Missing value for parameter".
More specific options for input parameters
The following options are not described in the --help
output. To see their descriptions, use the -hh
option instead.
Name | Description |
---|---|
--input-framing |
The input framing format. Defines how parameter sets are delimited in the input. Available options:
|
--input-param-name |
The parameter name in the input stream, required when the input format contains only values (that is, when --input-format raw is used). |
--input-columns |
A string with column names that replaces the CSV/TSV header. Relevant only when passing parameters in CSV/TSV format. It is assumed that the file does not contain a header. |
--input-skip-rows |
The number of CSV/TSV header rows to skip in the input data (excluding the row of column names if the --header option is used). Relevant only when passing parameters in CSV/TSV format. |
--input-batch |
The batch mode applied to parameter sets from stdin or --input-file .Available options:
|
--input-batch-max-rows |
The maximum size of the list for the input adaptive batching mode (default: 1000). |
--input-batch-max-delay |
The maximum delay before submitting a received parameter set for processing in the adaptive batch mode. The value is specified as a number with a time unit: s (seconds), ms (milliseconds), m (minutes), etc. Default value: 1s (1 second).The YDB CLI starts a timer when it receives the first set of parameters for the batch from the input and sends the accumulated batch for execution once the timer expires. This parameter enables efficient batching when the arrival rate of new parameter sets is unpredictable. |
Examples
Note
The examples use the quickstart
profile. To learn more, see Creating a profile to connect to a test database.
Passing the value of a single parameter
From the command line using --param
option:
ydb -p quickstart sql -s 'DECLARE $a AS Int64; SELECT $a' --param '$a=10'
Using a file in JSON format (which is used by default):
echo '{"a":10}' > p1.json
ydb -p quickstart sql -s 'DECLARE $a AS Int64; SELECT $a' --input-file p1.json
Via stdin
passing a JSON string as a set of one parameter:
echo '{"a":10}' | ydb -p quickstart sql -s 'DECLARE $a AS Int64; SELECT $a'
Via stdin
passing only a parameter value and setting a parameter name via the --input-param-name
option:
echo '10' | ydb -p quickstart sql -s 'DECLARE $a AS Int64; SELECT $a' --input-param-name a
Passing the values of parameters of different types from multiple sources
# Create a JSON file with fields 'a', 'b', and 'x', where 'x' will be ignored in the query
echo '{ "a":10, "b":"Some text", "x":"Ignore me" }' > p1.json
# Run the query using ydb-cli, passing in 'a' and 'b' from the input file, and 'c' as a direct parameter
ydb -p quickstart sql \
-s 'DECLARE $a AS Int64;
DECLARE $b AS Utf8;
DECLARE $c AS Int64;
SELECT $a, $b, $c' \
--input-file p1.json \
--param '$c=30'
Command output:
┌─────────┬─────────────┬─────────┐
│ column0 │ column1 │ column2 │
├─────────┼─────────────┼─────────┤
│ 10 │ "Some text" │ 30 │
└─────────┴─────────────┴─────────┘
Passing Base64-encoded binary strings
ydb -p quickstart sql \
-s 'DECLARE $a AS String;
SELECT $a' \
--input-format json \
--input-binary-strings base64 \
--param '$a="SGVsbG8sIHdvcmxkCg=="'
Command output:
┌──────────────────┐
| column0 |
├──────────────────┤
| "Hello, world\n" |
└──────────────────┘
Passing raw binary content directly
curl -Ls http://ydb.tech/docs/en | ydb -p quickstart sql \
-s 'DECLARE $a AS String;
SELECT LEN($a)' \
--input-format raw \
--input-param-name a
Command output (the exact number of bytes may vary):
┌─────────┐
| column0 |
├─────────┤
| 66426 |
└─────────┘
Passing CSV data
echo '10,Some text' | ydb -p quickstart sql \
-s 'DECLARE $a AS Int32;
DECLARE $b AS String;
SELECT $a, $b' \
--input-format csv \
--input-columns 'a,b'
Command output:
┌─────────┬─────────────┐
| column0 | column1 |
├─────────┼─────────────┤
| 10 | "Some text" |
└─────────┴─────────────┘
Iterative streaming processing
YDB CLI supports executing a query multiple times with different sets of parameter values provided via stdin
or an input file (but not both). In this case, the database connection is established once, and the query execution plan is cached. This approach significantly improves performance compared to making separate CLI calls.
To use this feature, stream different sets of values for the same parameters to the command input (stdin
or --input-file
) one after another, specifying a rule for the YDB CLI to separate the sets.
The query is executed as many times as there are parameter value sets received from the input. Each set is combined with the parameter values defined using the --param
options. The command completes once the input stream is closed. Each query is executed within a dedicated transaction.
A rule for separating parameter sets (framing) complements the --input-format
option:
Name | Description |
---|---|
--input-framing |
Input framing format. Defines how parameter sets are delimited on the input. Available options:
|
Warning
When using a newline character as a separator between parameter sets, ensure that newline characters are not used inside the parameter sets. Quoting a text value does not allow newlines within the text. Multiline JSON documents are also not allowed.
Example
Streaming processing of multiple parameter sets
Suppose you need to run your query three times with the following sets of values for the a
and b
parameters:
a
= 10,b
= 20a
= 15,b
= 25a
= 35,b
= 48
Let's create a file that contains lines with JSON representations of these sets:
echo -e '{"a":10,"b":20}\n{"a":15,"b":25}\n{"a":35,"b":48}' | tee par1.txt
Command output:
{"a":10,"b":20}
{"a":15,"b":25}
{"a":35,"b":48}
Let's execute the query by passing the content of this file to stdin
, formatting the output as JSON:
cat par1.txt | \
ydb -p quickstart sql \
-s 'DECLARE $a AS Int64;
DECLARE $b AS Int64;
SELECT $a + $b' \
--input-framing newline-delimited \
--format json-unicode
Command output:
{"column0":30}
{"column0":40}
{"column0":83}
Or just by passing the input file name to the --input-file
option:
ydb -p quickstart sql \
-s 'DECLARE $a AS Int64;
DECLARE $b AS Int64;
SELECT $a + $b' \
--input-file par1.txt \
--input-framing newline-delimited \
--format json-unicode
Command output:
{"column0":30}
{"column0":40}
{"column0":83}
This output can be passed as input to the next query command if it has a column0
parameter.
Suppose you need to run your query three times with the following sets of values for the a
and b
parameters:
a
= 10,b
= 20a
= 15,b
= 25a
= 35,b
= 48
Let's create a file that contains lines with CSV representations of these sets:
echo -e 'a,b\n10,20\n15,25\n35,48' | tee par1.txt
Command output:
a,b
10,20
15,25
35,48
Let's execute the query by passing the content of this file to stdin
, formatting the output as CSV:
cat par1.txt | \
ydb -p quickstart sql \
-s 'DECLARE $a AS Int64;
DECLARE $b AS Int64;
SELECT $a + $b' \
--input-format csv \
--input-framing newline-delimited \
--format csv
Command output:
30
40
83
Or just by passing the input file name to the --input-file
option:
ydb -p quickstart sql \
-s 'DECLARE $a AS Int64;
DECLARE $b AS Int64;
SELECT $a + $b' \
--input-file par1.txt \
--input-format csv \
--input-framing newline-delimited \
--format csv
Command output:
30
40
83
This output can be passed as input to another command running a different parameterized query.
Suppose you need to run your query three times, with the following sets of values for the a
and b
parameters:
a
= 10,b
= 20a
= 15,b
= 25a
= 35,b
= 48
Let's create a file that includes lines with TSV representations of these sets:
echo -e 'a\tb\n10\t20\n15\t25\n35\t48' | tee par1.txt
Command output:
a b
10 20
15 25
35 48
Let's execute the query by passing the content of this file to stdin
, formatting the output as TSV:
cat par1.txt | \
ydb -p quickstart sql \
-s 'DECLARE $a AS Int64;
DECLARE $b AS Int64;
SELECT $a + $b' \
--input-format tsv \
--input-framing newline-delimited \
--format tsv
Command output:
30
40
83
Or just by passing the input file name to the --input-file
option:
ydb -p quickstart sql \
-s 'DECLARE $a AS Int64;
DECLARE $b AS Int64;
SELECT $a + $b' \
--input-file par1.txt \
--input-format tsv \
--input-framing newline-delimited \
--format tsv
Command output:
30
40
83
This output can be passed as input to the next query command.
Streaming processing with joining parameter values from different sources
For example, you need to run your query three times with the following sets of values for the a
and b
parameters:
a
= 10,b
= 100a
= 15,b
= 100a
= 35,b
= 100
echo -e '10\n15\n35' | \
ydb -p quickstart sql \
-s 'DECLARE $a AS Int64;
DECLARE $b AS Int64;
SELECT $a + $b AS sum1' \
--param '$b=100' \
--input-framing newline-delimited \
--input-param-name a \
--format json-unicode
Command output:
{"sum1":110}
{"sum1":115}
{"sum1":135}
Batched streaming processing
The YDB CLI supports automatic conversion of multiple consecutive parameter sets to a List<...>
, enabling you to process them in a single request and transaction. As a result, you can achieve a substantial performance gain compared to one-by-one query processing.
Two batch modes are supported:
- Full
- Adaptive
Full batch mode
The full
mode is a simplified batch mode where the query runs only once, and all the parameter sets received from the input (stdin
or --input-file
) are wrapped into a List<...>
. If the request is too large, you will receive an error.
Use this batch mode when you want to ensure transaction atomicity by applying all the parameters within a single transaction.
Adaptive batch mode
In the adaptive
mode, the input stream is split into multiple transactions, with the batch size automatically determined for each of them.
In this mode, you can process a broad range of dynamic workloads with unpredictable or infinite amounts of data, as well as workloads with an unpredictable or significantly varying rate of new sets appearing in the input. For example, this scenario is common when sending the output of another command to stdin
using the |
operator.
The adaptive mode solves two key issues of dynamic stream processing:
- Limiting the maximum batch size.
- Limiting the maximum data processing delay.
Syntax
To use the batching capabilities, define the List<...>
or List<Struct<...>>
parameter in the query's DECLARE
clause, and use the following options:
Name | Description |
---|---|
--input-batch |
The batch mode applied to parameter sets on stdin or --input-file .Available options:
|
In the adaptive batch mode, you can use the following additional parameters:
Name | Description |
---|---|
--input-batch-max-rows |
The maximum number of parameter sets per batch in the adaptive batch mode. The next batch will be sent with the query if the number of parameter sets reaches the specified limit. When set to 0 , there is no limit.Default value: 1000 .Parameter values are sent to each query execution without streaming, so the total size per gRPC request that includes the parameter values has an upper limit of about 5 MB. |
--input-batch-max-delay |
The maximum delay before submitting a received parameter set for processing in the adaptive batch mode. The value is specified as a number with a time unit: s (seconds), ms (milliseconds), m (minutes), etc. Default value: 1s (1 second).The YDB CLI starts a timer when it receives the first set of parameters for the batch from the input and sends the accumulated batch for execution once the timer expires. This parameter enables efficient batching when the arrival rate of new parameter sets is unpredictable. |
Examples: Full batch processing
echo -e '{"a":10,"b":20}\n{"a":15,"b":25}\n{"a":35,"b":48}' | \
ydb -p quickstart sql \
-s 'DECLARE $x AS List<Struct<a:Int64,b:Int64>>;
SELECT ListLength($x), $x' \
--input-framing newline-delimited \
--input-param-name x \
--input-batch full
Command output:
┌─────────┬───────────────────────────────────────────────────┐
| column0 | column1 |
├─────────┼───────────────────────────────────────────────────┤
| 3 | [{"a":10,"b":20},{"a":15,"b":25},{"a":35,"b":48}] |
└─────────┴───────────────────────────────────────────────────┘
Examples: Adaptive batch processing
Limiting the maximum data processing delay
This example demonstrates adaptive batching triggered by a processing delay. In the first line of the command below, we generate 1,000 rows with a delay of 0.2 seconds on stdout
and pipe them to stdin
for the ydb sql
query execution command. The query execution command displays the parameter batches in each subsequent query call.
for i in $(seq 1 1000); do echo "Line$i"; sleep 0.2; done | \
ydb -p quickstart sql \
-s 'DECLARE $x AS List<Utf8>;
SELECT ListLength($x), $x' \
--input-framing newline-delimited \
--input-format raw \
--input-param-name x \
--input-batch adaptive
Command output (the actual values may differ):
┌─────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
| column0 | column1 |
├─────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
| 14 | ["Line1","Line2","Line3","Line4","Line5","Line6","Line7","Line8","Line9","Line10","Line11","Line12","Line13","Line14"] |
└─────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────┬─────────────────────────────────────────────────────────┐
| column0 | column1 |
├─────────┼─────────────────────────────────────────────────────────┤
| 6 | ["Line15","Line16","Line17","Line18","Line19","Line20"] |
└─────────┴─────────────────────────────────────────────────────────┘
┌─────────┬─────────────────────────────────────────────────────────┐
| column0 | column1 |
├─────────┼─────────────────────────────────────────────────────────┤
| 6 | ["Line21","Line22","Line23","Line24","Line25","Line26"] |
└─────────┴─────────────────────────────────────────────────────────┘
^C
The first batch includes all the rows accumulated at the input while the database connection was being established, which is why it is larger than the subsequent ones.
You can terminate the command by pressing Ctrl+C or wait 200 seconds until the input generation is finished.
Limit on the number of records
This example demonstrates adaptive batching triggered by the number of parameter sets. In the first line of the command below, we generate 200 rows. The command displays parameter batches in each subsequent query call, applying the specified limit --input-batch-max-rows
of 20 (the default limit is 1,000).
This example also demonstrates the option to join parameters from different sources and generate JSON as output.
for i in $(seq 1 200); do echo "Line$i"; done | \
ydb -p quickstart sql \
-s 'DECLARE $x AS List<Utf8>;
DECLARE $p2 AS Int64;
SELECT ListLength($x) AS count, $p2 AS p2, $x AS items' \
--input-framing newline-delimited \
--input-format raw \
--input-param-name x \
--input-batch adaptive \
--input-batch-max-rows 20 \
--param '$p2=10' \
--format json-unicode
Command output:
{"count":20,"p2":10,"items":["Line1","Line2","Line3","Line4","Line5","Line6","Line7","Line8","Line9","Line10","Line11","Line12","Line13","Line14","Line15","Line16","Line17","Line18","Line19","Line20"]}
{"count":20,"p2":10,"items":["Line21","Line22","Line23","Line24","Line25","Line26","Line27","Line28","Line29","Line30","Line31","Line32","Line33","Line34","Line35","Line36","Line37","Line38","Line39","Line40"]}
...
{"count":20,"p2":10,"items":["Line161","Line162","Line163","Line164","Line165","Line166","Line167","Line168","Line169","Line170","Line171","Line172","Line173","Line174","Line175","Line176","Line177","Line178","Line179","Line180"]}
{"count":20,"p2":10,"items":["Line181","Line182","Line183","Line184","Line185","Line186","Line187","Line188","Line189","Line190","Line191","Line192","Line193","Line194","Line195","Line196","Line197","Line198","Line199","Line200"]}
Deleting multiple records from a YDB table based on primary keys
Warning
Supported only for row-oriented tables. Support for column-oriented tables is currently under development.
If you attempt to delete a large number of rows from a substantial table using a simple DELETE FROM large_table WHERE id > 10;
statement, you may encounter an error due to exceeding the transaction record limit. This example shows how to delete an unlimited number of records from YDB tables without breaching this limitation.
Let's create a test table:
ydb -p quickstart sql -s 'CREATE TABLE test_delete_1(id UInt64 NOT NULL, PRIMARY KEY (id))'
Add 100,000 records to it:
for i in $(seq 1 100000); do echo "$i"; done | \
ydb -p quickstart import file csv -p test_delete_1
Delete all records with id
greater than 10:
ydb -p quickstart sql \
-s 'SELECT t.id FROM test_delete_1 AS t WHERE t.id > 10' \
--format json-unicode | \
ydb -p quickstart sql \
-s 'DECLARE $lines AS List<Struct<id:UInt64>>;
DELETE FROM test_delete_1 WHERE id IN (SELECT tl.id FROM AS_TABLE($lines) AS tl)' \
--input-framing newline-delimited \
--input-param-name lines \
--input-batch adaptive \
--input-batch-max-rows 10000
Processing messages read from a topic
Examples of processing messages read from a topic are provided in Running an SQL query with the transmission of messages from the topic as parameters.