Returns the result of evaluating the expressions specified after
It can be used in combination with other operations to obtain other effect.
SELECT "Hello, world!";
SELECT 2 + 2;
Data source for
SELECT. The argument can accept the table name, the result of another
SELECT, or a named expression. Between
FROM, list the comma-separated column names from the source (or
* to select all columns).
SELECT key FROM my_table;
SELECT * FROM (SELECT value FROM my_table);
$table_name = "my_table"; SELECT * FROM $table_name;
To make a
SELECT by secondary index statement, use the following:
SELECT * FROM TableName VIEW IndexName WHERE …
Select all the fields from the
seriestable using the
views_indexindex with the
SELECT series_id, title, info, release_date, views, uploaded_user_id FROM series VIEW views_index WHERE views >= someValue
userstables on the
userNamefield using the
SELECT t1.series_id, t1.title FROM series VIEW users_index AS t1 INNER JOIN users VIEW name_index AS t2 ON t1.uploaded_user_id == t2.user_id WHERE t2.name == userName;
It's set after the data source in
FROM and is used for additional hints for tables. You can't use hints for subqueries and named expressions.
The following values are supported:
INFER_SCHEMA: Sets the flag for output of the table schema. The behavior is similar to the yt.inferSchema pragma, but for a specific data source. You can specify the number of rows to output (from 1 to 1000).
FORCE_INFER_SCHEMA: Sets the flag for table schema output. The behavior is similar to the yt.ForceInferSchema pragma, but for a specific data source. You can specify the number of rows to output (from 1 to 1000).
DIRECT_READ: Suppresses certain optimizers and enforces accessing table contents as is. The behavior is similar to the debug pragma DirectRead, but for a specific data source.
INLINE: Hints that the table contents is small and you need to use its in-memory view to process the query. The actual size of the table is not controlled in this case, and if it's large, the query might fail with an out-of-memory error.
UNORDERED: Suppresses original table sorting.
XLOCK: Hints that you need to lock the table exclusively. It's useful when you read a table at the stage of processing the query metaprogram, and then update its contents in the main query. Avoids data loss if an external process managed to change the table between executing a metaprogram phase and the main part of the query.
SCHEMAtype: Hints that the specified table schema must be used entirely, ignoring the schema in the metadata.
COLUMNStype: Hints that the specified types should be used for columns whose names match the table's column names in the metadata, as well as which columns are additionally present in the table.
IGNORE_TYPE_V3: Sets the flag to ignore type_v3 types in the table. The behavior is similar to the yt.IgnoreTypeV3 pragma, but for a specific data source.
When setting the
COLUMNS hints, the type must be a structure.
SELECT key FROM my_table WITH INFER_SCHEMA; SELECT key FROM my_table WITH FORCE_INFER_SCHEMA="42";
$s = (SELECT COUNT(*) FROM my_table WITH XLOCK); INSERT INTO my_table WITH TRUNCATE SELECT EvaluateExpr($s) AS a;
SELECT key, value FROM my_table WITH SCHEMA Struct<key:String, value:Int32>;
SELECT key, value FROM my_table WITH COLUMNS Struct<value:Int32?>;
SELECT key, value FROM EACH($my_tables) WITH SCHEMA Struct<key:String, value:List<Int32>>;
Filtering rows in the
SELECT result based on a condition.
SELECT key FROM my_table WHERE value > 0;
SELECT result using a comma-separated list of sorting criteria. As a criteria, you can use a column value or an expression on columns. Ordering by column sequence number is not supported (
ORDER BY Nwhere
N is a number).
Each criteria can be followed by the sorting direction:
ASC: Sorting in the ascending order. Applied by default.
DESC: Sorting in the descending order.
Multiple sorting criteria will be applied left-to-right.
SELECT key, string_column FROM my_table ORDER BY key DESC, LENGTH(string_column) ASC;
You can also use
ORDER BY for window functions.
LIMIT and OFFSET
LIMIT: limits the output to the specified number of rows. By default, the output is not restricted.
OFFSET: specifies the offset from the beginning (in rows). By default, it's zero.
SELECT key FROM my_table LIMIT 7;
SELECT key FROM my_table LIMIT 7 OFFSET 3;
SELECT key FROM my_table LIMIT 3, 7; -- equivalent to the previous example
ASSUME ORDER BY
Checking that the
SELECT result is sorted by the value in the specified column or multiple columns. The result of such a
SELECT statement is treated as sorted, but without actually running a sort. Sort check is performed at the query execution stage.
As in case of
ORDER BY, it supports setting the sort order using the keywords
ASC (ascending order) and
DESC (descending order). Expressions are not supported in
ASSUME ORDER BY.
SELECT key || "suffix" as key, -CAST(subkey as Int32) as subkey FROM my_table ASSUME ORDER BY key, subkey DESC;
TABLESAMPLE and SAMPLE
Building a random sample from the data source specified in
TABLESAMPLE is part of the SQL standard and works as follows:
- The operating mode is specified:
BERNOULLImeans "slowly, straightforwardly going through all the data, but in a truly random way".
SYSTEMuses knowledge about the physical data storage of data to avoid full data scans, but somewhat sacrificing randomness of the sample.
The data is split into sufficiently large blocks, and the whole data blocks are sampled. For applied calculations on sufficiently large tables, the result may well be consistent.
- The size of the random sample is indicated as a percentage after the operating mode, in parentheses.
- To manage the block size in the
SYSTEMmode, use the
- Optionally, it can be followed by the
REPEATABLEkeyword and an integer in parentheses to be used as a seed for a pseudorandom number generator.
SAMPLE is a shorter alias without sophisticated settings and sample size specified as a fraction. It currently corresponds to the
BERNOULLI mode, if the
REPEATABLE keyword is added, the seed is mixed with the chunk ID for each chunk in the table. That's why sampling from different tables with the same content might produce different results.
SELECT * FROM my_table TABLESAMPLE BERNOULLI(1.0) REPEATABLE(123); -- one percent of the table
SELECT * FROM my_table TABLESAMPLE SYSTEM(1.0); -- about one percent of the table
SELECT * FROM my_table SAMPLE 1.0 / 3; -- one-third of the table
Selecting unique rows.
DISTINCT to calculated values is not currently implemented. For this purpose, use a subquery or the clause
GROUP BY ... AS ....
SELECT DISTINCT value -- only unique values from the table FROM my_table;
SELECT execution procedure
SELECT query result is calculated as follows:
- Determine the set of input tables by evaluating the FROM clauses.
- Apply SAMPLE/TABLESAMPLE to input tables.
- Execute FLATTEN COLUMNS or FLATTEN BY; aliases set in
FLATTEN BYbecome visible after this point.
- Execute every JOIN.
- Add to (or replace in) the data the columns listed in GROUP BY ... AS ....
- Execute WHERE — Discard all the data mismatching the predicate.
- Execute GROUP BY, evaluate aggregate functions.
- Apply the filter HAVING.
- Evaluate window functions;
- Evaluate expressions in
- Assign names set by aliases to expressions in
- Apply top-level DISTINCT to the resulting columns.
- Execute similarly every subquery inside UNION ALL, combine them (see PRAGMA AnsiOrderByLimitInUnionAll).
- Perform sorting with ORDER BY.
- Apply OFFSET and LIMIT to the result.
Column order in YQL
The standard SQL is sensitive to the order of columns in projections (that is, in
SELECT). While the order of columns must be preserved in the query results or when writing data to a new table, some SQL constructs use this order.
This applies, for example, to UNION ALL and positional ORDER BY (ORDER BY ordinal).
The column order is ignored in YQL by default:
- The order of columns in the output tables and query results is undefined
- The data scheme of the
UNION ALLresult is output by column names rather than positions
If you enable
PRAGMA OrderedColumns;, the order of columns is preserved in the query results and is derived from the order of columns in the input tables using the following rules:
SELECT: an explicit column enumeration dictates the result order.
SELECTwith an asterisk (
SELECT * FROM ...) inherits the order from its input.
- The order of columns after JOIN: First output the left-hand columns, then the right-hand ones. If the column order in any of the sides in the
JOINoutput is undefined, the column order in the result is also undefined.
- The order in
UNION ALLdepends on the UNION ALL execution mode.
- The column order for AS_TABLE is undefined.
In the YT table schema, key columns always precede non-key columns. The order of key columns is determined by the order of the composite key.
PRAGMA OrderedColumns; is enabled, non-key columns preserve their output order.
Concatenating results of multiple
SELECT statements (or subqueries).
UNION ALL modes are supported: by column names (the default mode) and by column positions (corresponds to the ANSI SQL standard and is enabled by the PRAGMA).
In the "by name" mode, the output of the resulting data schema uses the following rules:
- The resulting table includes all columns that were found in at least one of the input tables.
- If a column wasn't present in all the input tables, then it's automatically assigned the optional data type (that can accept
- If a column in different input tables had different types, then the shared type (the broadest one) is output.
- If a column in different input tables had a heterogeneous type, for example, string and numeric, an error is raised.
The order of output columns in this mode is equal to the largest common prefix of the order of inputs, followed by all other columns in the alphabetic order.
If the largest common prefix is empty (for example, if the order isn't specified for one of the inputs), then the output order is undefined.
In the "by position" mode, the output of the resulting data schema uses the following rules:
- All inputs must have equal number of columns
- The order of columns must be defined for all inputs
- The names of the resulting columns must match the names of columns in the first table
- The type of the resulting columns is output as a common (widest) type of input column types having the same positions
The order of the output columns in this mode is the same as the order of columns in the first input.
SELECT 1 AS x UNION ALL SELECT 2 AS y UNION ALL SELECT 3 AS z;
In the default mode, this query returns a selection with three columns x, y, and z. When
PRAGMA PositionalUnionAll; is enabled, the selection only includes the x column.
PRAGMA PositionalUnionAll; SELECT 1 AS x, 2 as y UNION ALL SELECT * FROM AS_TABLE([<|x:3, y:4|>]); -- error: the order of columns in AS_TABLE is undefined
By default, the entire YQL query is executed within a single transaction, and independent parts inside it are executed in parallel, if possible.
COMMIT; keyword you can add a barrier to the execution process to delay execution of expressions that follow until all the preceding expressions have completed.
To commit in the same way automatically after each expression in the query, you can use
INSERT INTO result1 SELECT * FROM my_table; INSERT INTO result2 SELECT * FROM my_table; COMMIT; -- result2 will already include the SELECT contents from the second line: INSERT INTO result3 SELECT * FROM result2;
Excluding columns from the result of
SELECT * WITHOUT foo, bar FROM my_table;
PRAGMA simplecolumns; SELECT * WITHOUT t.foo FROM my_table AS t CROSS JOIN (SELECT 1 AS foo) AS v;
FROM ... SELECT ...
An inverted format, first specifying the data source and then the operation.
FROM my_table SELECT key, value;
FROM a_table AS a JOIN b_table AS b USING (key) SELECT *;
Accessing named expressions as tables using the
AS_TABLE($variable) lets you use the value of
$variable as the data source for the query. In this case, the variable
$variable must have the type
$data = AsList( AsStruct(1u AS Key, "v1" AS Value), AsStruct(2u AS Key, "v2" AS Value), AsStruct(3u AS Key, "v3" AS Value)); SELECT Key, Value FROM AS_TABLE($data);