YDB Quick Start
Normally, YDB stores data on multiple SSD/NVMe or HDD raw disk devices without any filesystem. However, for simplicity, this guide emulates disks in RAM or using a file in a regular filesystem. Thus, this setup is unsuitable for any production usage or even benchmarks. See the cluster management documentation to learn how to run YDB in a production environment.
Install and start YDB
The recommended environment to run YDB is x86_64 Linux. If you don't have access to one, feel free to switch to the instructions on the "Docker" tab.
Create a directory for YDB and use it as the current working directory:
mkdir ~/ydbd && cd ~/ydbd
Download and run the installation script:
curl https://install.ydb.tech | bash
This will download and unpack the archive containing the
ydbdexecutable, libraries, configuration files, and scripts needed to start and stop the local cluster.
The script is executed entirely with the current user privileges (notice the lack of
sudo). Therefore, it can't do much on the system. You can check which exactly commands it runs by opening the same URL in your browser.
Start the cluster in one of the following storage modes:
In this case, all data is stored only in RAM, it will be lost when the cluster is stopped.
Data on disk:
When you run this command an 80GB
ydb.datafile will be created in the working directory if it weren't there before. Make sure there's enough disk space available to create it. This file will be used to emulate a raw disk device, which would have been used in production environments.
Data on a real disk drive:
./start.sh drive "/dev/$DRIVE_NAME"
/dev/$DRIVE_NAMEwith an actual device name that is not used for anything else, for example
/dev/sdb. The first time you run this command, the specified disk drive will be fully wiped and then used for YDB data storage. It is recommended to use a NVMe or SSD drive with at least 800Gb data volume. Such setup can be used for single-node performance testing or other environments that do not have any fault-tolerance requirements.
Starting storage process... Initializing storage ... Registering database ... Starting database process... Database started. Connection options for YDB CLI: -e grpc://localhost:2136 -d /Root/test
Create a directory for YDB and use it as the current working directory:
mkdir ~/ydbd && cd ~/ydbd
Pull the current version of the Docker image:
docker pull cr.yandex/yc/yandex-docker-local-ydb:latest
Run the Docker container:
docker run -d --rm --name ydb-local -h localhost \ -p 2135:2135 -p 2136:2136 -p 8765:8765 \ -v $(pwd)/ydb_certs:/ydb_certs -v $(pwd)/ydb_data:/ydb_data \ -e GRPC_TLS_PORT=2135 -e GRPC_PORT=2136 -e MON_PORT=8765 \ cr.yandex/yc/yandex-docker-local-ydb:latest
If the container starts successfully, you'll see the container's ID. The container might take a few minutes to initialize. The database will not be available until container initialization is complete.
Run your first "Hello, world!" query
The simplest way to launch your first YDB query is via the built-in web interface. It is launched by default on port 8765 of the YDB server. If you have launched it locally, open localhost:8765 in your web browser. If not, replace
localhost with your server's hostname in this URL or use
ssh -L 8765:localhost:8765 my-server-hostname-or-ip.example.com to set up port forwarding and still open localhost:8765. You'll see a page like this:
YDB is designed to be a multi-tenant system, with potentially thousands of users working with the same cluster simultaneously. Hence, most logical entities inside a YDB cluster reside in a flexible hierarchical structure more akin to Unix's virtual filesystem rather than a fixed-depth schema you might be familiar with from other database management systems. As you can see, the first level of hierarchy consists of databases running inside a single YDB process that might belong to different tenants.
/Root is for system purposes, while
/local (depending on the chosen installation method) is a playground created during installation in the previous step. Click on either
/local, enter your first query, and hit the "Run" button:
SELECT "Hello, world!"u;
The query returns the greeting, as it is supposed to:
Did you notice the odd
u suffix? YDB and its query language, YQL, are strongly typed. Regular strings in YDB can contain any binary data, while this suffix indicates that this string literal is of the
Utf8 data type, which can only contain valid UTF-8 sequences. Learn more about YDB's type system.
The second simplest way to run a SQL query with YDB is the command line interface (CLI), while most real-world applications will likely communicate with YDB via one of the available software development kits (SDK). Feel free to follow the rest of the guide using either the CLI or one of the SDKs instead of the web UI if you feel comfortable doing so.
Create your first table
The main purpose of database management systems is to store data for later retrieval. As an SQL-based system, YDB's primary abstraction for data storage is a table. To create our first one, run the following query:
CREATE TABLE example ( key UInt64, value String, PRIMARY KEY (key) );
As you can see, it is a simple key-value table. Let's walk through the query step-by-step:
- Each SQL statement kind like
CREATE TABLEhas more detailed explanation in YQL reference.
exampleis the table name identifier, while
valueare column name identifiers. It is recommended to use simple names for identifiers like these, but if you need one that contains non-trivial symbols, wrap the name in backticks.
Stringare data type names.
Stringrepresents a binary string, and
UInt64is a 64-bit unsigned integer. Thus, our example table stores string values identified by unsigned integer keys. More details about data types.
PRIMARY KEYis one of the fundamental concepts of SQL that has a significant impact on both application logic and performance. Following the SQL standard, the primary key also implies an unique constraint, meaning the table cannot have multiple rows with equal primary keys. In this example table, it's quite straightforward which column should be chosen as the primary key, which we specify as
(key)in round brackets after the respective keyword. In real-world scenarios, tables often have dozens of columns, and primary keys can be compound (consisting of multiple columns in a specified order), making choosing the right primary key more of an art. If you are interested in this topic, there's a guide on choosing the primary key for maximizing performance. YDB tables are required to have a primary key.
Add sample data
Now let's fill our table with some data. The simplest way is to just use literals:
INSERT INTO example (key, value) VALUES (123, "hello"), (321, "world");
INSERT INTOis the classic SQL statement for adding new rows to a table. However, it is not the most performant, as according to the SQL standard, it has to check whether the table already has rows with the given primary key values, and raise an error if they exist. Thus, if you run this query multiple times, all attempts except the first will return an error. If your application logic doesn't require this behavior, it is better to use
UPSERT INTOinstead of
INSERT INTO. Upsert (which stands for "update or insert") will blindly write the provided values, overwriting existing rows if there were any. The rest of the syntax will be the same.
(key, value)specifies the names of the columns we're inserting and their order. The values provided next need to match this specification, both in the number of columns and their data types.
- After the
VALUESkeyword, there's a list of tuples, each representing a table row. In this example, we have two rows identified by 123 and 321 in the
keycolumn, and "hello" and "world" values in the
To double-check that the rows were indeed added to the table, there's a common query that should return
2 in this case:
SELECT COUNT(*) FROM example;
A few notable details in this one:
FROMclause specifies a table to retrieve data from.
COUNTis an aggregate function that counts the number of values. By default, when there are no other special clauses around, the presence of any aggregate function collapses the result to one row containing aggregates over the whole input data (the
exampletable in this case).
*is a placeholder that normally means "all columns"; thus,
COUNTwill return the overall row count.
Another common way to fill a table with data is by combining
INSERT INTO (or
UPSERT INTO) and
SELECT. In this case, values to be stored are calculated inside the database instead of being provided by the client as literals. We'll use a slightly more realistic query to demonstrate this:
$subquery = SELECT ListFromRange(1000, 10000) AS keys; UPSERT INTO example SELECT key, CAST(RandomUuid(key) AS String) AS value FROM $subquery FLATTEN LIST BY keys AS key
There's quite a lot going on in this query; let's dig into it:
$subqueryis a named expression. This syntax is YQL's extension to the SQL standard that allows making complex queries more readable. It behaves the same as if you wrote that first
$subqueryis later used on the last row, but it allows comprehending what's going on piece by piece, like variables in regular programming languages.
ListFromRangeis a function that produces a list of consecutive integers, starting from the value provided in the first argument and ending with the value provided in the second argument. There's also a third optional argument that can allow skipping integers with a specified step, but we omit it in our example, which defaults to returning all integers in the given range.
Listis one of the most common container data types.
ASis a keyword used to give a name to the value we're returning from
SELECT; in this example,
FROM ... FLATTEN LIST BY ... AS ...has a few notable things happening:
SELECTused in the
FROMclause is called a subquery. That's why we chose this name for our
$subquerynamed expression, but we could have chosen something more meaningful to explain what it is. Subqueries normally aren't materialized; they just pass the output of one
SELECTto the input of another on the fly. They can be used as a means to produce arbitrarily complex execution graphs, especially if used in conjunction with other YQL features.
FLATTEN LIST BYclause modifies input passed via
FROMin the following way: for each row in the input data, it takes a column of list data type and produces multiple rows according to the number of elements in that list. Normally, that list column is replaced by the column with the current single element, but the
ASkeyword in this context allows access to both the whole list (under the original name) and the current element (under the name specified after
AS), or just to make it more clear what is what, like in this example.
RandomUuidis a function that returns a pseudorandom UUID version 4. Unlike most other functions, it doesn't actually use what is passed as an argument (the
keycolumn); instead, it indicates that we need to call the function on each row. See the reference for more examples of how this works.
CAST(... AS ...)is a common function for converting values to a specified data type. In this context, the type specification is expected after
AS(in this case,
String), not an arbitrary name.
UPSERT INTOwill blindly write the values to the specified tables, as we discussed previously. Note that it didn't require
(key, value)column names specification when used in conjunction with
SELECT, as now columns can just be matched by names returned from
What will the
SELECT COUNT(*) FROM example; query return now?
Stop the cluster
Stop the local YDB cluster after you have finished experimenting:
To stop the local cluster, run the following command:
To stop the Docker container with the local cluster, run the following command:
docker kill ydb-local
Optionally, you can then clean up your filesystem by removing your working directory with the
rm -rf ~/ydbd command. All data inside the local YDB cluster will be lost.
Done! What's next?
After getting a hold of some basics demonstrated in this guide, you should be ready to jump into more advanced topics. Choose what looks the most relevant depending on your use case and role: