Lexical structure

The query in the YQL language is a valid UTF-8 text consisting of commands (statements) separated by semicolons (;).
The last semicolon can be omitted.
Each command is a sequence of tokens that are valid for this command.
Tokens can be keywords, IDs, literals, and so on.
Tokens are separated by whitespace characters (space, tab, line feed) or comments. The comment is not a part of the command and is syntactically equivalent to a space character.

Syntax compatibility modes

Two syntax compatibility modes are supported:

  • Advanced C++ (default)
  • ANSI SQL

ANSI SQL mode is enabled with a special comment --!ansi-lexer, which must be in the beginning of the query.

Specifics of interpretation of lexical elements in different compatibility modes are described below.

Comments

The following types of comments are supported:

  • Single-line comment: starts with -- (two minus characters following one another) and continues to the end of the line
  • Multiline comment: starts with /* and ends with */
SELECT 1; -- A single-line comment
/*
   Some multi-line comment
*/

In C++ syntax compatibility mode (default), a multiline comment ends with the nearest */.
The ANSI SQL syntax compatibility mode accounts for nesting of multiline comments:

--!ansi_lexer
SELECT * FROM T; /* this is a comment /* this is a nested comment, without ansi_lexer it raises an error  */ */

Keywords and identifiers

Keywords are tokens that have a fixed value in the YQL language. Examples of keywords: SELECT, INSERT, FROM, ACTION, and so on. Keywords are case-insensitive, that is, SELECT and SeLEcT are equivalent to each other.
The list of keywords is not fixed and is going to expand as the language develops. A keyword can't contain numbers and begin or end with an underscore.

Identifiers are tokens that identify the names of tables, columns, and other objects in YQL. Identifiers in YQL are always case-sensitive.
An identifier can be written in the body of the program without any special formatting, if the identifier:

  • Is not a keyword
  • Begins with a Latin letter or underscore
  • Is followed by a Latin letter, an underscore, or a number
SELECT my_column FROM my_table; -- my_column and my_table are identifiers

To include an arbitrary ID in the body of a query, the ID is enclosed in backticks:

SELECT `column with space` from T;
SELECT * FROM `my_dir/my_table`

IDs in backticks are never interpreted as keywords:

SELECT `select` FROM T; -- select - Column name in the T table

When using backticks, you can use the standard C escaping:

SELECT 1 as `column with\n newline, \x0a newline and \` backtick `;

In ANSI SQL syntax compatibility mode, arbitrary IDs can also be enclosed in double quotes. To include a double quote in a quoted ID, use two double quotes:

--!ansi_lexer
SELECT 1 as "column with "" double quote"; -- column name will be: column with " double quote

SQL hints

SQL hints are special settings with which a user can modify a query execution plan
(for example, enable/disable specific optimizations or force the JOIN execution strategy).
Unlike PRAGMA, SQL hints act locally – they are linked to a specific point in the YQL query (normally, after the keyword)
and affect only the corresponding statement or even a part of it.
SQL hints are a set of settings "name-value list" and defined inside special comments —
comments with SQL hints must have + as the first character:

--+ Name1(Value1 Value2 Value3) Name2(Value4) ...

An SQL hint name must be comprised of ASCII alphanumeric characters and start with a letter. Hint names are case insensitive.
A hint name must be followed by a custom number of space-separated values. A value can be a custom set of characters.
If there's a space or parenthesis in a set of characters, single quotation marks must be used:

--+ foo('value with space and paren)')
--+ foo('value1' value2)
-- equivalent to
--+ foo(value1 value2)

To escape a single quotation within a value, double it:

--+ foo('value with single quote '' inside')

If there're two or more hints with the same name in the list, the latter is used:

--+ foo(v1 v2) bar(v3) foo()
-- equivalent to
--+ bar(v3) foo()

Unknown SQL hint names (or syntactically incorrect hints) never result in errors, they're simply ignored:

--+ foo(value1) bar(value2  baz(value3)
-- due to a missing closing parenthesis in bar, is equivalent to
--+ foo(value1)

Thanks to this behavior, previous valid YQL queries with comments that look like hints remain intact.
Syntactically correct SQL hints in a place unexpected for YQL result in a warning:

-- presently, hints after SELECT are not supported
SELECT /*+ foo(123) */ 1; -- warning 'Hint foo will not be used'

What's important is that SQL hints are hints for an optimizer, so:

  • Hints never affect search results.
  • As YQL optimizers improve, a situation is possible when a hint becomes outdated and is ignored (for example, the algorithm based on a given hint completely changes or the optimizer becomes so sophisticated that it can be expected to choose the best solution, so some manual settings are likely to interfere).

String literals

A string literal (constant) is expressed as a sequence of characters enclosed in single quotes. Inside a string literal, you can use the C-style escaping rules:

SELECT 'string with\n newline, \x0a newline and \' backtick ';

In the C++ syntax compatibility mode (default), you can use double quotes instead of single quotes:

SELECT "string with\n newline, \x0a newline and \" backtick ";

In ASNI SQL compatibility mode, double quotes are used for IDs, and the only escaping that can be used for string literals is a pair of single quotes:

--!ansi_lexer
SELECT 'string with '' quote'; -- result: string with ' quote

Based on string literals, simple literals can be obtained.

Multi-line string literals

A multiline string literal is expressed as an arbitrary set of characters enclosed in double at signs @@:

$text = @@some
multiline
text@@;
SELECT LENGTH($text);

If you need to use double at signs in your text, duplicate them:

$text = @@some
multiline with double at: @@@@
text@@;
SELECT $text;

Typed string literals

  • For string literals, including multi-string ones, the String type is used by default (see also PRAGMA UnicodeLiterals).
  • You can use the following suffixes to explicitly control the literal type:
    • sString;
    • uUtf8;
    • yYson;
    • jJson.

Example:

SELECT "foo"u, '[1;2]'y, @@{"a":null}@@j;

Numeric literals

  • Integer literals have the default type Int32, if they fit within the Int32 range. Otherwise, they automatically expand to Int64.
  • You can use the following suffixes to explicitly control the literal type:
    • l: Int64.
    • s: Int16.
    • t: Int8.
  • Add the suffix u to convert a type to its corresponding unsigned type:
    • ul: Uint64.
    • u: Uint32.
    • us: Uint16.
    • ut: Uint8.
  • You can also use hexadecimal, octal, and binary format for integer literals using the prefixes 0x, 0o and 0b, respectively. You can arbitrarily combine them with the above-mentioned suffixes.
  • Floating point literals have the Double type by default, but you can use the suffix f to narrow it down to Float.
SELECT
  123l AS `Int64`,
  0b01u AS `Uint32`,
  0xfful AS `Uint64`,
  0o7ut AS `Uint8`,
  456s AS `Int16`,
  1.2345f AS `Float`;