Unicode
Functions for Unicode strings.
List of functions
-
Unicode::IsUtf(String) -> Bool
Checks whether a string is a valid UTF-8 sequence. For example, the string
"\xF0"
isn't a valid UTF-8 sequence, but the string"\xF0\x9F\x90\xB1"
correctly describes a UTF-8 cat emoji. -
Unicode::GetLength(Utf8{Flags:AutoMap}) -> Uint64
Returns the length of a utf-8 string in unicode code points. Surrogate pairs are counted as one character.
SELECT Unicode::GetLength("жніўня"); -- 6
-
Unicode::Find(string:Utf8{Flags:AutoMap}, subString:Utf8, [pos:Uint64?]) -> Uint64?
-
Unicode::RFind(string:Utf8{Flags:AutoMap}, subString:Utf8, [pos:Uint64?]) -> Uint64?
Finding the first (
RFind
- the last) occurrence of a substring in a string starting from thepos
position. Returns the position of the first character from the found substring. In case of failure, returns Null.SELECT Unicode::Find("aaa", "bb"); -- Null
-
Unicode::Substring(string:Utf8{Flags:AutoMap}, from:Uint64?, len:Uint64?) -> Utf8
Returns a
string
substring starting withfrom
that islen
characters long. If thelen
argument is omitted, the substring is taken to the end of the source string.If
from
exceeds the length of the original string, an empty string""
is returned.SELECT Unicode::Substring("0123456789abcdefghij", 10); -- "abcdefghij"
-
The
Unicode::Normalize...
functions convert the passed UTF-8 string to a normalization form:Unicode::Normalize(Utf8{Flags:AutoMap}) -> Utf8
-- NFCUnicode::NormalizeNFD(Utf8{Flags:AutoMap}) -> Utf8
Unicode::NormalizeNFC(Utf8{Flags:AutoMap}) -> Utf8
Unicode::NormalizeNFKD(Utf8{Flags:AutoMap}) -> Utf8
Unicode::NormalizeNFKC(Utf8{Flags:AutoMap}) -> Utf8
-
Unicode::Translit(string:Utf8{Flags:AutoMap}, [lang:String?]) -> Utf8
Transliterates with Latin letters the words from the passed string, consisting entirely of characters of the alphabet of the language passed by the second argument. If no language is specified, the words are transliterated from Russian. Available languages: "kaz", "rus", "tur", and "ukr".
SELECT Unicode::Translit("Тот уголок земли, где я провел"); -- "Tot ugolok zemli, gde ya provel"
-
Unicode::LevensteinDistance(stringA:Utf8{Flags:AutoMap}, stringB:Utf8{Flags:AutoMap}) -> Uint64
Calculates the Levenshtein distance for the passed strings.
-
Unicode::Fold(Utf8{Flags:AutoMap}, [ Language:String?, DoLowerCase:Bool?, DoRenyxa:Bool?, DoSimpleCyr:Bool?, FillOffset:Bool? ]) -> Utf8
Performs case folding on the passed string.
Parameters:
Language
is set according to the same rules as inUnicode::Translit()
.DoLowerCase
converts a string to lowercase letters, defaults totrue
.DoRenyxa
converts diacritical characters to similar Latin characters, defaults totrue
.DoSimpleCyr
converts diacritical Cyrillic characters to similar Latin characters, defaults totrue
.FillOffset
parameter is not used.
SELECT Unicode::Fold("Kongreßstraße", false AS DoSimpleCyr, false AS DoRenyxa); -- "kongressstrasse" SELECT Unicode::Fold("ҫурт"); -- "сурт" SELECT Unicode::Fold("Eylül", "Turkish" AS Language); -- "eylul"
-
Unicode::ReplaceAll(input:Utf8{Flags:AutoMap}, find:Utf8, replacement:Utf8) -> Utf8
-
Unicode::ReplaceFirst(input:Utf8{Flags:AutoMap}, find:Utf8, replacement:Utf8) -> Utf8
-
Unicode::ReplaceLast(input:Utf8{Flags:AutoMap}, find:Utf8, replacement:Utf8) -> Utf8
Replaces all/first/last occurrences of the
find
string in theinput
withreplacement
. -
Unicode::RemoveAll(input:Utf8{Flags:AutoMap}, symbols:Utf8) -> Utf8
-
Unicode::RemoveFirst(input:Utf8{Flags:AutoMap}, symbols:Utf8) -> Utf8
-
Unicode::RemoveLast(input:Utf8{Flags:AutoMap}, symbols:Utf8) -> Utf8
Deletes all/first/last occurrences of characters in the
symbols
set from theinput
. The second argument is interpreted as an unordered set of characters to be removed.SELECT Unicode::ReplaceLast("absence", "enc", ""); -- "abse" SELECT Unicode::RemoveAll("abandon", "an"); -- "bdo"
-
Unicode::ToCodePointList(Utf8{Flags:AutoMap}) -> List<Uint32>
Splits a string into a Unicode sequence of codepoints.
-
Unicode::FromCodePointList(List<Uint32>{Flags:AutoMap}) -> Utf8
Generates a Unicode string from codepoints.
SELECT Unicode::ToCodePointList("Щавель"); -- [1065, 1072, 1074, 1077, 1083, 1100] SELECT Unicode::FromCodePointList(AsList(99,111,100,101,32,112,111,105,110,116,115,32,99,111,110,118,101,114,116,101,114)); -- "code points converter"
-
Unicode::Reverse(Utf8{Flags:AutoMap}) -> Utf8
Reverses a string.
-
Unicode::ToLower(Utf8{Flags:AutoMap}) -> Utf8
-
Unicode::ToUpper(Utf8{Flags:AutoMap}) -> Utf8
-
Unicode::ToTitle(Utf8{Flags:AutoMap}) -> Utf8
Converts a string to UPPER, lower, or Title case.
-
Unicode::SplitToList( string:Utf8?, separator:Utf8, [ DelimeterString:Bool?, SkipEmpty:Bool?, Limit:Uint64? ]) -> List<Utf8>
Splits a string into substrings by separator.
string
-- Source string.separator
-- Separator. Parameters:- DelimeterString:Bool? — treating a delimiter as a string (true, by default) or a set of characters "any of" (false)
- SkipEmpty:Bool? - whether to skip empty strings in the result, is false by default
- Limit:Uint64? - Limits the number of fetched components (unlimited by default); if the limit is exceeded, the raw suffix of the source string is returned in the last item
-
Unicode::JoinFromList(List<Utf8>{Flags:AutoMap}, separator:Utf8) -> Utf8
Concatenates a list of strings via a
separator
into a single string.SELECT Unicode::SplitToList("One, two, three, four, five", ", ", 2 AS Limit); -- ["One", "two", "three, four, five"] SELECT Unicode::JoinFromList(["One", "two", "three", "four", "five"], ";"); -- "One;two;three;four;five"
-
Unicode::ToUint64(string:Utf8{Flags:AutoMap}, [prefix:Uint16?]) -> Uint64
Converts a string to a number.
The second optional argument sets the number system. By default, 0 (automatic detection by prefix).
Supported prefixes:0x(0X)
- base-16,0
- base-8. Defaults to base-10.
The-
sign before a number is interpreted as in C unsigned arithmetic. For example,-0x1
-> UI64_MAX.
If there are incorrect characters in a string or a number goes beyond ui64, the function terminates with an error. -
Unicode::TryToUint64(string:Utf8{Flags:AutoMap}, [prefix:Uint16?]) -> Uint64?
Similar to the
Unicode::ToUint64()
function, except that it returnsNULL
instead of an error.SELECT Unicode::ToUint64("77741"); -- 77741 SELECT Unicode::ToUint64("-77741"); -- 18446744073709473875 SELECT Unicode::TryToUint64("asdh831"); -- Null