Basic built-in functions

Below are the general-purpose functions. For specialized functions, there are separate articles: aggregate functions, window functions, and functions for lists, dictionaries, structures, data types.

COALESCE

Iterates through the arguments from left to right and returns the first non-empty argument found. To be sure that the result is non-empty (not of an optional type), the rightmost argument must be of this type (often a literal is used for this). With a single argument, returns this argument unchanged.

Lets you pass potentially empty values to functions that can't handle them by themselves.

A short format using the low-priority ?? operator is available (lower than the Boolean operations). You can use the NVL alias.

Examples

SELECT COALESCE(
  maybe_empty_column,
  "it's empty!"
) FROM my_table;
SELECT
  maybe_empty_column ?? "it's empty!"
FROM my_table;
SELECT NVL(
  maybe_empty_column,
  "it's empty!"
) FROM my_table;

(all three examples above are equivalent)

LENGTH

Returns the length of the string in bytes. This function is also available under the LEN name .

Examples

SELECT LENGTH("foo");
SELECT LEN("bar");

Note

To calculate the length of a string in Unicode characters, you can use the function Unicode::GetLength.

To get the number of elements in the list, use the function ListLength.

SUBSTRING

Returns a substring.

Required arguments:

  • Source string;
  • Position: The offset from the beginning of the string in bytes (integer) or NULL meaning "from the beginning".

Optional arguments:

  • Substring length: The number of bytes starting from the specified position (an integer, or the default NULL meaning "up to the end of the source string").

Indexing starts from zero. If the specified position and length are beyond the string, returns an empty string.
If the input string is optional, the result is also optional.

Examples

SELECT SUBSTRING("abcdefg", 3, 1); -- d
SELECT SUBSTRING("abcdefg", 3); -- defg
SELECT SUBSTRING("abcdefg", NULL, 3); -- abc

FIND

Finding the position of a substring in a string.

Required arguments:

  • Source string;
  • The substring being searched for.

Optional arguments:

  • A position in bytes to start the search with (an integer or NULL by default that means "from the beginning of the source string").

Returns the first substring position found or NULL (meaning that the desired substring hasn't been found starting from the specified position).

Examples

SELECT FIND("abcdefg_abcdefg", "abc"); -- 0
SELECT FIND("abcdefg_abcdefg", "abc", 1); -- 8
SELECT FIND("abcdefg_abcdefg", "abc", 9); -- null

RFIND

Reverse finding the position of a substring in a string, from the end to the beginning.

Required arguments:

  • Source string;
  • The substring being searched for.

Optional arguments:

  • A position in bytes to start the search with (an integer or NULL by default, meaning "from the end of the source string").

Returns the first substring position found or NULL (meaning that the desired substring hasn't been found starting from the specified position).

Examples

SELECT RFIND("abcdefg_abcdefg", "bcd"); -- 9
SELECT RFIND("abcdefg_abcdefg", "bcd", 8); -- 1
SELECT RFIND("abcdefg_abcdefg", "bcd", 0); -- null

StartsWith, EndsWith

Checking for a prefix or suffix in a string.

Required arguments:

  • Source string;
  • The substring being searched for.

The arguments can be of the String or Utf8 type and can be optional.

Examples

SELECT StartsWith("abc_efg", "abc") AND EndsWith("abc_efg", "efg"); -- true
SELECT StartsWith("abc_efg", "efg") OR EndsWith("abc_efg", "abc"); -- false
SELECT StartsWith("abcd", NULL); -- null
SELECT EndsWith(NULL, Utf8("")); -- null

IF

Checks the condition: IF(condition_expression, then_expression, else_expression).

It's a simplified alternative for CASE WHEN ... THEN ... ELSE ... END.

You may omit the else_expression argument. In this case, if the condition is false (condition_expression returned false), an empty value is returned with the type corresponding to then_expression and allowing for NULL. Hence, the result will have an optional data type.

Examples

SELECT
  IF(foo > 0, bar, baz) AS bar_or_baz,
  IF(foo > 0, foo) AS only_positive_foo
FROM my_table;

NANVL

Replaces the values of NaN (not a number) in expressions like Float, Double, or Optional.

Arguments:

  1. The expression where you want to make a replacement.
  2. The value to replace NaN.

If one of the arguments is Double, the result isDouble, otherwise, it's Float. If one of the arguments is Optional, then the result is Optional.

Examples

SELECT
  NANVL(double_column, 0.0)
FROM my_table;

Random...

Generates a pseudorandom number:

  • Random(): A floating point number (Double) from 0 to 1.
  • RandomNumber(): An integer from the complete Uint64 range.
  • RandomUuid(): Uuid version 4.

No arguments are used for random number generation: they are only needed to control the time of the call. A new random number is returned at each call. Therefore:

  • If Random is called again within a same query and with a same set of arguments, the same set of random numbers is returned. Keep in mind that we mean the arguments themselves (i.e., the text between parentheses) rather than their values.
  • Calling of Random with the same set of arguments in different queries returns different sets of random numbers.

Warning

If Random is used in named expressions, its one-time calculation is not guaranteed. Depending on the optimizers and runtime environment, it can be counted both once and multiple times. To make sure it's only counted once, materialize a named expression into a table.

Use cases:

  • SELECT RANDOM(1);: Get one random value for the entire query and use it multiple times (to get multiple random values, you can pass various constants of any type).
  • SELECT RANDOM(1) FROM table;: The same random number for each row in the table.
  • SELECT RANDOM(1), RANDOM(2) FROM table;: Two random numbers for each row of the table, all the numbers in each of the columns are the same.
  • SELECT RANDOM(some_column) FROM table;: Different random numbers for each row in the table.
  • SELECT RANDOM(some_column), RANDOM(some_column) FROM table;: Different random numbers for each row of the table, but two identical numbers within the same row.
  • SELECT RANDOM(some_column), RANDOM(some_column + 1) FROM table; or SELECT RANDOM(some_column), RANDOM(other_column) FROM table;: Two columns, with different numbers in both.

Examples

SELECT
    Random(key) -- [0, 1)
FROM my_table;
SELECT
    RandomNumber(key) -- [0, Max<Uint64>)
FROM my_table;
SELECT
    RandomUuid(key) -- Uuid version 4
FROM my_table;
SELECT
    RANDOM(column) AS rand1,
    RANDOM(column) AS rand2, -- same as rand1
    RANDOM(column, 1) AS randAnd1, -- different from rand1/2
    RANDOM(column, 2) AS randAnd2 -- different from randAnd1
FROM my_table;

CurrentUtc...

CurrentUtcDate(), CurrentUtcDatetime() and CurrentUtcTimestamp(): Getting the current date and/or time in UTC. The result data type is specified at the end of the function name.

The arguments are optional and work same as RANDOM.

Examples

SELECT CurrentUtcDate();
SELECT CurrentUtcTimestamp(TableRow()) FROM my_table;

CurrentTz...

CurrentTzDate(), CurrentTzDatetime(), and CurrentTzTimestamp(): Get the current date and/or time in the IANA time zone specified in the first argument. The result data type is specified at the end of the function name.

The arguments that follow are optional and work same as RANDOM.

Examples

SELECT CurrentTzDate("Europe/Moscow");
SELECT CurrentTzTimestamp("Europe/Moscow", TableRow()) FROM my_table;

AddTimezone

Adding the time zone information to the date/time in UTC. In the result of SELECT or after CAST, a String will be subject to the time zone rules used to calculate the time offset.

Arguments:

  1. Date: the type is Date``Datetime/Timestamp.
  2. The IANA name of the time zone.

Result type: TzDate/TzDatetime/TzTimestamp, depending on the input data type.

Examples

SELECT AddTimezone(Datetime("2018-02-01T12:00:00Z"), "Europe/Moscow");

RemoveTimezone

Removing the time zone data and converting the value to date/time in UTC.

Arguments:

  1. Date: the type is TzDate/TzDatetime/TzTimestamp.

Result type: Date/Datetime/Timestamp, depending on the input data type.

Examples

SELECT RemoveTimezone(TzDatetime("2018-02-01T12:00:00,Europe/Moscow"));

MAX_OF, MIN_OF, GREATEST, and LEAST

Returns the minimum or maximum among N arguments. Those functions let you replace the SQL standard statement CASE WHEN a < b THEN a ELSE b END that would be too sophisticated for N more than two.

The argument types must be mutually castable and accept NULL.

GREATEST is a synonym for MAX_OF and LEAST is a synonym for MIN_OF.

Examples

SELECT MIN_OF(1, 2, 3);

AsTuple, AsStruct, AsList, AsDict, AsSet, AsListStrict, AsDictStrict and AsSetStrict

Creates containers of the applicable types. For container literals, operator notation is also supported.

Specifics:

  • The container elements are passed in arguments. Hence, the number of elements in the resulting container is equal to the number of arguments passed, except when the dictionary keys repeat.
  • AsTuple and AsStruct can be called without arguments, and also the arguments can have different types.
  • The field names in AsStruct are set using AsStruct(field_value AS field_name).
  • Creating a list requires at least one argument if you need to output the element types. To create an empty list with the given type of elements, use the function ListCreate. You can create an empty list as an AsList() call without arguments. In this case, this expression will have the EmptyList type.
  • Creating a dictionary requires at least one argument if you need to output the element types. To create an empty dictionary with the given type of elements, use the function DictCreate. You can create an empty dictionary as an AsDict() call without arguments, in this case, this expression will have the EmptyDict type.
  • Creating a set requires at least one argument if you need to output element types. To create an empty set with the given type of elements, use the function SetCreate. You can create an empty set as an AsSet() call without arguments, in this case, this expression will have the EmptySet type.
  • AsList outputs the common type of elements in the list. A type error is raised in the case of incompatible types.
  • AsDict separately outputs the common types for keys and values. A type error is raised in the case of incompatible types.
  • AsSet outputs common types for keys. A type error is raised in the case of incompatible types.
  • AsListStrict, AsDictStrict, AsSetStrict require the same type for their arguments.
  • AsDict and AsDictStrict expect Tuple of two elements as arguments (key and value, respectively). If the keys repeat, only the value for the first key remains in the dictionary.
  • AsSet and AsSetStrict expect keys as arguments.

Examples

SELECT
  AsTuple(1, 2, "3") AS `tuple`,
  AsStruct(
    1 AS a,
    2 AS b,
    "3" AS c
  ) AS `struct`,
  AsList(1, 2, 3) AS `list`,
  AsDict(
    AsTuple("a", 1),
    AsTuple("b", 2),
    AsTuple("c", 3)
  ) AS `dict`,
  AsSet(1, 2, 3) AS `set`

Container literals

Some containers support operator notation for their literal values:

  • Tuple: (value1, value2...);
  • Structure: <|name1: value1, name2: value2...|>;
  • List: [value1, value2,...];
  • Dictionary: {key1: value1, key2: value2...};
  • Set: {key1, key2...}.

In every case, you can use an insignificant trailing comma. For a tuple with one element, this comma is required: (value1,).
For field names in the structure literal, you can use an expression that can be calculated at evaluation time, for example, string literals or identifiers (including those enclosed in backticks).

For nested lists, use AsList, for nested dictionaries, use AsDict, for nested sets, use AsSet, for nested tuples, use AsTuple, for nested structures, use AsStruct.

Examples

$name = "computed " || "member name";
SELECT
  (1, 2, "3") AS `tuple`,
  <|
    `complex member name`: 2.3,
    b: 2,
    $name: "3",
    "inline " || "computed member name": false
  |> AS `struct`,
  [1, 2, 3] AS `list`,
  {
    "a": 1,
    "b": 2,
    "c": 3,
  } AS `dict`,
  {1, 2, 3} AS `set`

Variant, AsVariant

Variant() creates a variant value over a tuple or structure.

Arguments:

  • Value
  • String with a field name or tuple index
  • Variant type

Example

$var_type = Variant<foo: Int32, bar: Bool>;

SELECT
   Variant(6, "foo", $var_type) as Variant1Value,
   Variant(false, "bar", $var_type) as Variant2Value;

AsVariant() creates a value of a variant over a structure including one field. This value can be implicitly converted to any variant over a structure that has a matching data type for this field name and might include more fields with other names.

Arguments:

  • Value
  • A string with the field name

Example

SELECT
   AsVariant(6, "foo") as VariantValue

Enum, AsEnum

Enum() creates an enumeration value.

Arguments:

  • A string with the field name
  • Enumeration type

Example

$enum_type = Enum<Foo, Bar>;
SELECT
   Enum("Foo", $enum_type) as Enum1Value,
   Enum("Bar", $enum_type) as Enum2Value;

AsEnum() creates a value of enumeration including one element. This value can be implicitly cast to any enumeration containing such a name.

Arguments:

  • A string with the name of an enumeration item

Example

SELECT
   AsEnum("Foo");

AsTagged, Untag

Wraps the value in the Tagged data type with the specified tag, preserving the physical data type. Untag: The reverse operation.

Required arguments:

  1. Value of any type.
  2. Tag name.

Returns a copy of the value from the first argument with the specified tag in the data type.

Examples of use cases:

  • Returns to the client's web interface the media files from BASE64-encoded strings.
  • Additional refinements at the level of returned columns types.

TableRow, JoinTableRow

Getting the entire table row as a structure. No arguments. JoinTableRow in case of JOIN always returns a structure with table prefixes.

Example

SELECT TableRow() FROM my_table;

Ensure...

Checking for the user conditions:

  • Ensure(): Checking whether the predicate is true at query execution.
  • EnsureType(): Checking that the expression type exactly matches the specified type.
  • EnsureConvertibleTo(): A soft check of the expression type (with the same rules as for implicit type conversion).

If the check fails, the entire query fails.

Arguments:

  1. An expression that will result from a function call if the check is successful. It's also checked for the data type in the corresponding functions.
  2. Ensure uses a Boolean predicate that is checked for being true. The other functions use the data type that can be obtained using the relevant functions, or a string literal with a text description of the type.
  3. An optional string with an error comment to be included in the overall error message when the query is complete. The data itself can't be used for type checks, since the data check is performed at query validation (or can be an arbitrary expression in the case of Ensure).

To check the conditions based on the final calculation result, it's convenient to combine Ensure with DISCARD SELECT.

Examples

SELECT Ensure(
    value,
    value < 100,
    "value out or range"
) AS value FROM my_table;
SELECT EnsureType(
    value,
    TypeOf(other_value),
    "expected value and other_value to be of same type"
) AS value FROM my_table;
SELECT EnsureConvertibleTo(
    value,
    Double?,
    "expected value to be numeric"
) AS value FROM my_table;

Literals of primitive types

For primitive types, you can create literals based on string literals.

Syntax

<Primitive type>( <string>[, <additional attributes>] )

Unlike CAST("myString" AS MyType):

  • The check for literal's castability to the desired type occurs at validation.
  • The result is non-optional.

For the data types Date, Datetime, Timestamp, and Interval, literals are supported only in the format corresponding to ISO 8601. Interval has the following differences from the standard:

  • It supports the negative sign for shifts to the past.
  • Microseconds can be expressed as fractional parts of seconds.
  • You can't use units of measurement exceeding one week.
  • The options with the beginning/end of the interval and with repetitions, are not supported.

For the data types TzDate, TzDatetime, TzTimestamp, literals are also set in the format meeting ISO 8601, but instead of the optional Z suffix, they specify the IANA name of the time zone, separated by comma (for example, GMT or Europe/Moscow).

For the Decimal parametric data type, two additional arguments are specified:

  • Total number of decimal places (up to 35, inclusive).
  • Number of places after the decimal point (out of the total number, meaning it can't be larger than the previous argument).

Examples

SELECT
  Bool("true"),
  Uint8("0"),
  Int32("-1"),
  Uint32("2"),
  Int64("-3"),
  Uint64("4"),
  Float("-5"),
  Double("6"),
  Decimal("1.23", 5, 2), -- up to 5 decimal digits, with 2 after the decimal point
  String("foo"),
  Utf8("Hello"),
  Yson("<a=1>[3;%false]"),
  Json(@@{"a":1,"b":null}@@),
  Date("2017-11-27"),
  Datetime("2017-11-27T13:24:00Z"),
  Timestamp("2017-11-27T13:24:00.123456Z"),
  Interval("P1DT2H3M4.567890S"),
  TzDate("2017-11-27,Europe/Moscow"),
  TzDatetime("2017-11-27T13:24:00,America/Los_Angeles"),
  TzTimestamp("2017-11-27T13:24:00.123456,GMT"),
  Uuid("f9d5cc3f-f1dc-4d9c-b97e-766e57ca4ccb");

ToBytes and FromBytes

Conversion of primitive data types to a string with their binary representation and back. Numbers are represented in the little endian format.

Examples

SELECT
    ToBytes(123), -- "\u0001\u0000\u0000\u0000"
    FromBytes(
        "\xd2\x02\x96\x49\x00\x00\x00\x00",
        Uint64
    ); -- 1234567890ul

ByteAt

Getting the byte value inside a string at an index counted from the beginning of the string. If an invalid index is specified, NULL is returned.

Arguments:

  1. String: String or Utf8.
  2. Index: Uint32.

Examples

SELECT
    ByteAt("foo", 0), -- 102
    ByteAt("foo", 1), -- 111
    ByteAt("foo", 9); -- NULL

...Bit

TestBit(), ClearBit(), SetBit() and FlipBit(): Test, clear, set, or flip a bit in an unsigned number using the specified bit sequence number.

Arguments:

  1. An unsigned number that's subject to the operation. TestBit is also implemented for strings.
  2. Number of the bit.

TestBit returns true/false. The other functions return a copy of their first argument with the corresponding conversion.

Examples:

SELECT
    TestBit(1u, 0), -- true
    SetBit(8u, 0); -- 9

Abs

The absolute value of the number.

Examples

SELECT Abs(-123); -- 123

Just, Unwrap, Nothing

Just(): Change the value's data type to optional from the current data type (i.e.,T is converted to T?).

The reverse operation is Unwrap.

Examples

SELECT
  Just("my_string"); --  String?

Unwrap(): Converting the optional value of the data type to the relevant non-optional type, raising a runtime error if the data is NULL. This means that T? becomes T.

If the value isn't optional, then the function returns its first argument unchanged.

Arguments:

  1. Value to be converted.
  2. An optional string with a comment for the error text.

Reverse operation is Just.

Examples

$value = Just("value");

SELECT Unwrap($value, "Unexpected NULL for $value");

Nothing(): Create an empty value for the specified Optional data type.

Examples

SELECT
  Nothing(String?); -- an empty (NULL) value with the String? type

Learn more about ParseType and other functions for data types.

Callable

Create a callable value with the specified signature from a lambda function. It's usually used to put callable values into containers.

Arguments:

  1. Type.
  2. Lambda function.

Examples:

$lambda = ($x) -> {
    RETURN CAST($x as String)
};

$callables = AsTuple(
    Callable(Callable<(Int32)->String>, $lambda),
    Callable(Callable<(Bool)->String>, $lambda),
);

SELECT $callables.0(10), $callables.1(true);

Pickle, Unpickle

Pickle() and StablePickle() serialize an arbitrary object into a sequence of bytes, if possible. Typical non-serializable objects are Callable and Resource. The serialization format is not versioned and can be used within a single query. For the Dict type, the StablePickle function pre-sorts the keys, and for Pickle, the order of dictionary elements in the serialized representation isn't defined.

Unpickle() is the inverse operation (deserialization), where with the first argument being the data type of the result and the second argument is the string with the result of Pickle() or StablePickle().

Examples:

SELECT *
FROM my_table
WHERE Digest::MurMurHash32(
        Pickle(TableRow())
    ) %10 ==0; -- actually, it is better to use TABLESAMPLE

$buf = Pickle(123);
SELECT Unpickle(Int32, $buf);

StaticMap

Transforms a structure or tuple by applying a lambda function to each item.

Arguments:

  • Structure or tuple.
  • Lambda for processing items.

Result: a structure or tuple with the same number and naming of items as in the first argument, and with item data types determined by lambda results.

Examples:

SELECT *
FROM (
    SELECT
        StaticMap(TableRow(), ($item) -> {
            return CAST($item AS String);
        })
    FROM my_table
) FLATTEN COLUMNS; -- converting all columns to rows

StaticZip

Merges structures or tuples element-by-element. All arguments (one or more) must be either structures with the same set of fields or tuples of the same length.
The result will be a structure or tuple, respectively.
Each item of the result is a tuple comprised of items taken from arguments.

Examples:

$one = <|k1:1, k2:2.0|>;
$two = <|k1:3.0, k2:4|>;

-- Adding two structures item-by-item
SELECT StaticMap(StaticZip($one, $two), ($tuple)->($tuple.0 + $tuple.1)) AS sum;

AggregationFactory

Create a factory for aggregation functions to separately describe the methods of aggregation and data types subject to aggregation.

Arguments:

  1. A string in double quotes with the name of an aggregate function, for example "MIN".
  2. Optional parameters of the aggregate function that are data-independent. For example, the percentile value in PERCENTILE.

The resulting factory can be used as the second parameter of the function AGGREGATE_BY.
If the aggregate function is applied to two columns instead of one, as, for example, MIN_BY, then in AGGREGATE_BY, the first argument passes a Tuple of two values. See more details in the description of the applicable aggregate function.

Examples:

$factory = AggregationFactory("MIN");
SELECT
    AGGREGATE_BY (value, $factory) AS min_value -- apply the MIN aggregation to the "value" column
FROM my_table;

AggregateTransform...

AggregateTransformInput() converts an aggregation factory, for example, obtained using the AggregationFactory function, to other factory, in which the specified transformation of input items is performed before starting aggregation.

Arguments:

  1. Aggregation factory.
  2. A lambda function with one argument that converts an input item.

Examples:

$f = AggregationFactory("sum");
$g = AggregateTransformInput($f, ($x) -> (cast($x as Int32)));
$h = AggregateTransformInput($f, ($x) -> ($x * 2));
select ListAggregate([1,2,3], $f); -- 6
select ListAggregate(["1","2","3"], $g); -- 6
select ListAggregate([1,2,3], $h); -- 12

AggregateTransformOutput() converts an aggregation factory, for example, obtained using the AggregationFactory function, to other factory, in which the specified transformation of the result is performed after ending aggregation.

Arguments:

  1. Aggregation factory.
  2. A lambda function with one argument that converts the result.

Examples:

$f = AggregationFactory("sum");
$g = AggregateTransformOutput($f, ($x) -> ($x * 2));
select ListAggregate([1,2,3], $f); -- 6
select ListAggregate([1,2,3], $g); -- 12

AggregateFlatten

Adapts a factory for aggregation functions, for example, obtained using the AggregationFactory function in a way that allows aggregation of list input items. This operation is similar to FLATTEN LIST BY: Each list item is aggregated.

Arguments:

  1. Aggregation factory.

Examples:

$i = AggregationFactory("AGGREGATE_LIST_DISTINCT");
$j = AggregateFlatten($i);
select AggregateBy(x, $j) from (
   select [1,2] as x
   union all
   select [2,3] as x
); -- [1, 2, 3]