Yson
This section describes YSON, a JSON-like data format developed at Yandex.
Note
SQL functions for working with YSON are documented here.
Introduction
How YSON differs from JSON:
- Binary encoding is supported for scalar types (numbers, strings, and booleans);
- Attributes: an arbitrary dictionary that can be attached to a literal of any type (including scalars).
Syntactic differences:
- List items are separated by semicolons instead of commas;
- In maps, keys are separated from values with
=rather than:; - String literals do not always need quotes (only when parsing would otherwise be ambiguous).
The following scalar types exist:
- Strings (
string); - Signed and unsigned 64-bit integers (
int64anduint64); - Double-precision floating-point numbers (
double); - Boolean (
boolean); - The special entity type with a single literal (
#).
Scalar types usually have both text and binary representations.
There are two composite types:
Scalar types
Strings
String tokens come in three forms:
-
Identifiers match the regular expression
[A-Za-z_][A-Za-z0-9_.\-]*(first character is a letter or underscore; from the second character onward, digits and-,.are also allowed). An identifier denotes a string equal to its text and is mainly a shorthand (no quotes).Examples:
abc123;_;a-b.
-
Text strings are C-escaped strings in double quotes.
Examples:
"abc123";"";"quotation-mark: \", backslash: \\, tab: \t, unicode: \xEA".
-
Binary strings:
\x01 + length (protobuf sint32 wire format) + data (<length> bytes).
Signed 64-bit integers (int64)
Two encodings:
- Text: (
0,123,-123,+123); - Binary:
\x02 + value (protobuf sint64 wire format).
Unsigned 64-bit integers (uint64)
Two encodings:
- Text: (
10000000000000,123u); - Binary:
\x06 + value (protobuf uint64 wire format).
Floating-point numbers (double)
Two encodings:
- Text: (
0.0,-1.0,1e-9,1.5E+9,32E1,%inf,%-inf,%nan); - Binary:
\x03 + protobuf double wire format.
Warning
Text encoding of floating-point numbers involves rounding; parsing the text back may yield a different value. Use binary encoding when exact values matter.
Warning
The values %inf, %-inf, and %nan are not valid in JSON, so calling Yson::SerializeJson on YSON that contains them results in an error.
Boolean literals (boolean)
Two encodings:
- Text (
%false,%true); - Binary (
\x04,\x05).
Entity (entity)
Entity is an atomic scalar value with no payload of its own. It is useful in many scenarios—for example, entity often represents null. An entity may still carry attributes.
Lexically, entity is written as #.
Reserved literals
Special tokens:
;, =, #, [, ], {, }, <, >, ), /, @, !, +, ^, :, ,, ~.
Not all of these are used in YSON; some appear in YPath.
Composite types
List (list)
Written as [value; ...; value], where each value is a literal of any scalar or composite type.
Example: [1; "hello"; {a=1; b=2}].
Map (map)
Written as {key = value; ...; key = value}. Here each key is a string literal, and each value is a literal of any scalar or composite type.
Example: {a = "hello"; "38 parrots" = [38]}.
Attributes
Attributes can be attached to any YSON literal. Syntax: <key = value; ...; key = value> value. Inside the angle brackets the syntax matches that of a map. For example, <a = 10; b = [7,7,8]>"some-string" or <"44" = 44>44. Attributes on entity literals are common, for example <id="aaad6921-b5704588-17990259-7b88bad3">#.
Grammar
YSON data can be one of three kinds:
- Node (a single tree; in the grammar below,
<tree>) - ListFragment (values separated by
;;<list-fragment>) - MapFragment (key–value pairs separated by
;;<map-fragment>)
The grammar below is defined up to whitespace, which may be inserted or removed freely between tokens:
<tree> = [ <attributes> ], <object>;
<object> = <scalar> | <map> | <list> | <entity>;
<scalar> = <string> | <int64> | <uint64> | <double> | <boolean>;
<list> = "[", <list-fragment>, "]";
<map> = "{", <map-fragment>, "}";
<entity> = "#";
<attributes> = "<", <map-fragment>, ">";
<list-fragment> = { <list-item>, ";" }, [ <list-item> ];
<list-item> = <tree>;
<map-fragment> = { <key-value-pair>, ";" }, [ <key-value-pair> ];
<key-value-pair> = <string>, "=", <tree>; % Key cannot be empty
The trailing ; after the last element inside <list-fragment> and <map-fragment> may be omitted. The following forms are valid when reading:
|
With trailing |
Short form |
|
|
Examples
- Map (Node)
{ performance = 1 ; precision = 0.78 ; recall = 0.21 }
- Map (Node)
{ cv-precision = [ 0.85 ; 0.24 ; 0.71 ; 0.70 ] }
- List (Node)
[ 1; 2; 3; 4; 5 ]
- String (Node)
foobar
"hello world"
-
Int64 (Node)
42 -
Double (Node)
3.1415926 -
ListFragment
{ key = a; value = 0 };
{ key = b; value = 1 };
{ key = c; value = 2; unknown_value = [] }
- MapFragment
do = create; type = table; scheme = {}
- HomeDirectory (Node)
{ home = { sandello = { mytable = <type = table> # ; anothertable = <type = table> # } ; monster = { } } }
YPATH
This section describes YPath, a language for addressing objects inside YSON.
YPath expresses paths that identify nodes in YSON. It supports navigating the tree and attaching annotations useful for operations such as reading and writing properties.
Examples:
/0-25-3ec012f-406daf5c/@type— path to thetypeattribute of the object with id0-25-3ec012f-406daf5c;
There are several YPath variants. In the simplest case, YPath is a string that encodes a path.
Lexical structure
A simple YPath string is split into tokens as follows:
- Special characters: slash (
/), at sign (@), ampersand (&), asterisk (*); - Literals: maximal non-empty sequences of non-special characters. Literals may use escaping
\<escape-sequence>, where<escape-sequence>is one of\,/,@,&,*,[,{, orx<hex1><hex2>with hexadecimal digits<hex1>and<hex2>.
Syntax and semantics
Structurally, YPath looks like /<relative-path>. <relative-path> is parsed left to right into steps:
- Child step: a
/token followed by a literal. Applies to maps and lists. For a map, the literal is the child name. Example:/child— child namedchild. For a list, the literal is a decimal integer index (zero-based). Negative indices count from the end. Examples:/1— second item;/-1— last item. - Attribute step:
/@followed by a literal. Can be used anywhere; moves to the attribute with that name. Example:/@attr.
Note
In YPath, relative paths start with a slash. The slash is not a separator like in file systems but part of the navigation command. Concatenating two YPaths is therefore plain string concatenation. This may look unusual but is convenient in many places and easy to get used to.
Examples
$data = Yson(@@{"0-25-3ec012f-406daf5c" = {a=<why="I can just do it">1;b=2}}@@);
SELECT Yson::SerializeJson($data), Yson::SerializeJson(Yson::YPath($data, "/0-25-3ec012f-406daf5c/a/@/why"));
Result:
|
column0 |
column1 |
|
|
$data = Yson(@@{
a = <a=z;x=y>[
{abc=123; def=456};
{abc=234; xyz=789; entity0123 = #};
];
b = {str = <it_is_string=%true>"hello"; "38 parrots" = [38]};
entity0 = <here_you_can_store=something>#;
}
@@);
SELECT Yson::ConvertToStringDict(Yson::YPath($data, "/a/@")) AS attrs_root,
Yson::SerializeJson(Yson::YPath($data, "/b/str/@")) AS attrs_b_str,
Yson::SerializeJson(Yson::YPath($data, "/b/str/@/it_is_string")) AS attr_exact,
Yson::SerializeJson(Yson::YPath($data, "/a/0")) as array_index0,
Yson::SerializeJson(Yson::YPath($data, "/a/-1")) as array_last,
Yson::SerializeJson(Yson::YPath($data, "/entity0")) as entity,
Yson::SerializeJson(Yson::YPath($data, "/a/#entity0123/abc")) as entity1,
Yson::SerializeJson(Yson::YPath($data, "/a")) AS whole_a,
Yson::SerializeJson($data) AS whole_data;
Result:
|
attrs_root |
attrs_b_str |
attr_exact |
array_index0 |
array_last |
entity |
entity1 |
whole_a |
whole_data |
|
|
|
|
|
|
|
|
|