Migrating YDB data schemas with the Liquibase migration tool

Introduction

Liquibase is an open-source library for tracking, managing, and applying changes to database schemas. It is extended with dialects for different database management systems (DBMS), including YDB.

Dialect is the main component in the Liquibase framework, which assists in creating SQL queries for a database, considering the specific features of a given DBMS.

Features of the YDB Dialect

Liquibase's main functionality is the abstract description of database schemas in a .xml, .json, or .yaml format. This ensures portability when switching between different DBMSs.

The YDB dialect supports the following basic constructs in the standard migration description (changeset).

Creating a table

The сreateTable changeset is responsible for creating a table. The descriptions of types from the SQL standard are mapped to primitive types in YDB. For example, the bigint type will be converted to Int64.

Note

You can also explicitly specify the original type name, such as Int32, Json, JsonDocument, Bytes, or Interval. However, in this case, the schema won't be portable.

Table of comparison of Liquibase types descriptions with YDB types:

Liquibase types YDB type
boolean, java.sql.Types.BOOLEAN, java.lang.Boolean, bit, bool Bool
blob, longblob, longvarbinary, String, java.sql.Types.BLOB, java.sql.Types.LONGBLOB, java.sql.Types.LONGVARBINARY, java.sql.Types.VARBINARY,java.sql.Types.BINARY, varbinary, binary, image, tinyblob, mediumblob, long binary, long varbinary Bytes (synonym String)
java.sql.Types.DATE, smalldatetime, date Date
decimal, java.sql.Types.DECIMAL, java.math.BigDecimal Decimal(22,9)
double, java.sql.Types.DOUBLE, java.lang.Double Double
float, java.sql.Types.FLOAT, java.lang.Float, real, java.sql.Types.REAL Float
int, integer, java.sql.Types.INTEGER, java.lang.Integer, int4, int32 Int32
bigint, java.sql.Types.BIGINT, java.math.BigInteger, java.lang.Long, integer8, bigserial, long Int64
java.sql.Types.SMALLINT, int2, smallserial, smallint Int16
java.sql.Types.TINYINT, tinyint Int8
char, java.sql.Types.CHAR, bpchar, character, nchar, java.sql.Types.NCHAR, nchar2, text, varchar, java.sql.Types.VARCHAR, java.lang.String, varchar2, character varying, nvarchar, java.sql.Types.NVARCHAR, nvarchar2, national, clob, longvarchar, longtext, java.sql.Types.LONGVARCHAR, java.sql.Types.CLOB, nclob, longnvarchar, ntext, java.sql.Types.LONGNVARCHAR, java.sql.Types.NCLOB, tinytext, mediumtext, long varchar, long nvarchar Text (synonym Utf8)
timestamp, java.sql.Types.TIMESTAMP, java.sql.Timestamp Timestamp
datetime, time, java.sql.Types.TIME, java.sql.Time Datetime

Warning

In YDB, the Timestamp data type stores dates with microsecond precision. The java.sql.Timestamp or java.time.Instant store timestamps with nanosecond precision, so you should be aware of this when using these data types.

The type names are case insensitive.

The dropTable changeset - delete a table. For example: <dropTable tableName="episodes"/>

Changing the table structure

addColumn - add a column to a table. For example:

<addColumn tableName="seasons">
    <column name="is_deleted" type="bool"/>
</addColumn>
"changes": [
    {
      "addColumn": {
        "tableName": "seasons",
        "columns": [
          {
            "column": {
              "name": "is_deleted",
              "type": "bool"
            }
          }
        ]
      }
    }
  ]
changes:
- addColumn:
    tableName: seasons
    columns:
    - column:
        name: is_deleted
        type: bool

createIndex - create a secondary index. For example:

<createIndex tableName="episodes" indexName="episodes_index" unique="false">
    <column name="title"/>
</createIndex>
"changes": [
    {
      "createIndex": {
        "tableName": "episodes",
        "indexName": "episodes_index",
        "unique": "false",
        "columns": {
          "column": {
            "name": "title"
          }
        }
      }
    }
changes:
- createIndex:
    tableName: episodes
    indexName: episodes_index
    unique: false
    columns:
    - column:
        name: title

Warning

YDB doesn't support unique secondary indexes.

Note

Asynchronous indexes should be created using native SQL migrations.

dropIndex - drop a secondary index. For example:

<dropIndex tableName="series" indexName="series_index"/>
"changes": [
  {
    "dropIndex": {
      "tableName": "series",
      "indexName": "series_index"
    }
  }
]
changes:
- dropIndex:
    tableName: series
    indexName: series_index

Ingesting data into a table

loadData, loadUpdateData - upload data from a CSV file into a table. loadUpdateData loads data using the UPSERT INTO command.

insert is a changeset that performs a single insert into a table using the INSERT INTO command. For example:

<insert tableName="episodes">
    <column name="series_id" valueNumeric="1"/>
    <column name="season_id" valueNumeric="1"/>
    <column name="episode_id" valueNumeric="1"/>
    <column name="title" value="Yesterday's Jam"/>
    <column name="air_date" valueDate="2023-04-03T08:46:23.456"/>
</insert>
"changes": [
  {
    "insert": {
      "tableName": "episodes",
      "columns": [
        {
          "column": {
            "name": "series_id",
            "valueNumeric": "1"
          }
        },
        {
          "column": {
            "name": "season_id",
            "valueNumeric": "1"
          }
        },
        {
          "column": {
            "name": "episode_id",
            "valueNumeric": "1"
          }
        },
        {
          "column": {
            "name": "title",
            "value": "Yesterday's Jam"
          }
        },
        {
          "column": {
            "name": "air_date",
            "valueDate": "2023-04-03T08:46:23.456"
          }
        }
      ]
    }
  }
]
changes:
- insert:
    tableName: episodes
    columns:
      - column:
          name: series_id
          valueNumeric: 1
      - column:
          name: season_id
          valueNumeric: 1
      - column:
          name: episode_id
          valueNumeric: 1
      - column:
          name: title
          value: Yesterday's Jam
      - column:
          name: air_date
          valueDate: 2023-04-03T08:46:23.456

You can also specify any value in the value field. Data from the value field in insert changeset or CSV files will be automatically converted to the required types, taking into account strict typing in YDB.

The type formatting table to load into the table:

YDB type Description format
Bool true or false
Int8, Int16, Int32, Int64 A signed integer
Uint8, Uint16, Uint32, Uint64 An unsigned integer
Text, Bytes, Json, JsonDocument Represent as text
Float, Double, Decimal(22, 9) A real number
Interval ISO-8601, corresponds to the java.time.Duration in Java.
Date Pattern YYYY-MM-DD calendar date from standard ISO-8601
Datetime Pattern YYYY-MM-DDThh:mm:ss, timezone will be set to UTC
Timestamp The timestamp from the ISO-8601 standard corresponds to the java.time.Instant in Java, timezone will be set to UTC (precision in microseconds - Timestamp type YDB restriction)

Example CSV file:

id,bool,bigint,smallint,tinyint,float,double,decimal,uint8,uint16,uint32,uint64,text,binary,json,jsondocument,date,datetime,timestamp,interval
2,true,123123,13000,112,1.123,1.123123,1.123123,12,13,14,15,kurdyukov-kir,binary,{"asd": "asd"},{"asd": "asd"},2014-04-06,2023-09-16T12:30,2023-07-31T17:00:00.00Z,PT10S

Warning

To understand which SQL statements YDB can perform and what are the restrictions on data types, read the documentation for the query language YQL.

Note

It is important to note that custom YQL instructions can be applied via native SQL queries.

How to use it?

There are two ways:

The project's README describes how to use it from Java or Kotlin in detail. There is also an example of a Spring Boot application using it.

First, you need to install Liquibase itself using one of the recommended methods. Then you need to place the .jar archives of YDB JDBC driver and Liquibase YDB dialect into the internal/lib folder.

# $(which liquibase)
cd ./internal/lib/

# you may need to sudo
# set actual versions of .jar files
curl -L -o ydb-jdbc-driver.jar https://repo.maven.apache.org/maven2/tech/ydb/jdbc/ydb-jdbc-driver-shaded/2.0.7/ydb-jdbc-driver-shaded-2.0.7.jar
curl -L -o liquibase-ydb-dialect.jar https://repo.maven.apache.org/maven2/tech/ydb/dialects/liquibase-ydb-dialect/1.0.0/liquibase-ydb-dialect-1.0.0.jar

For a more detailed description, see the Manual library management in Liquibase documentation.

Now the liquibase command line utility can be used with YDB.

Liquibase usage scenarios

Initializing Liquibase on an empty YDB cluster

The main command is liquibase update, which applies migrations if the current schema in YDB lags behind the user-defined description.

Let's apply this changeset to an empty database:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
                      http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.8.xsd">

    <changeSet id="episodes" author="kurdyukov-kir">
        <comment>Table episodes.</comment>

        <createTable tableName="episodes">
            <column name="series_id" type="bigint">
                <constraints primaryKey="true"/>
            </column>
            <column name="season_id" type="bigint">
                <constraints primaryKey="true"/>
            </column>
            <column name="episode_id" type="bigint">
                <constraints primaryKey="true"/>
            </column>

            <column name="title" type="text"/>
            <column name="air_date" type="timestamp"/>
        </createTable>
        <rollback>
            <dropTable tableName="episodes"/>
        </rollback>
    </changeSet>
    <changeSet id="index_episodes_title" author="kurdyukov-kir">
        <createIndex tableName="episodes" indexName="index_episodes_title" unique="false">
            <column name="title"/>
        </createIndex>
    </changeSet>
</databaseChangeLog>

After executing the liquibase update command, Liquibase will print the following log:

UPDATE SUMMARY
Run:                          2
Previously run:               0
Filtered out:                 0
-------------------------------
Total change sets:            2

Liquibase: Update has been successful. Rows affected: 2
Liquibase command 'update' was executed successfully.

After applying migrations, the data schema now looks like this:

_assets/liquibase-step-1.png

You can see that Liquibase has created two service tables: DATABASECHANGELOG, which is the migration log, and DATABASECHANGELOGLOCK, which is a table for acquiring a distributed lock.

Example contents of the DATABASECHANGELOG table:

AUTHOR COMMENTS CONTEXTS DATEEXECUTED DEPLOYMENT_ID DESCRIPTION EXECTYPE FILENAME ID LABELS LIQUIBASE MD5SUM ORDEREXECUTED TAG
kurdyukov-kir Table episodes. 12:53:27 1544007500 createTable tableName=episodes EXECUTED migration/episodes.xml episodes 4.25.1 9:4067056a5ab61db09b379a93625870ca 1
kurdyukov-kir "" 12:53:28 1544007500 createIndex indexName=index_episodes_title, tableName=episodes EXECUTED migration/episodes.xml index_episodes_title 4.25.1 9:49b8b0b22d18c7fd90a3d6b2c561455d 2

Database schema evolution

Let's say we need to create a YDB topic and turn off the param AUTO_PARTITIONING_BY_SIZE of the table. This can be done with a native SQL script:

--liquibase formatted sql

--changeset kurdyukov-kir:create-a-topic
CREATE TOPIC `my_topic` (
    CONSUMER my_consumer
) WITH (
     retention_period = Interval('P1D')
);

--changeset kurdyukov-kir:auto-partitioning-disabled
ALTER TABLE episodes SET (AUTO_PARTITIONING_BY_SIZE = DISABLED);

Also, let's add a new column is_deleted and remove the index_episodes_title index:


<changeSet id="alter-episodes" author="kurdyukov-kir">
    <comment>Alter table episodes.</comment>

    <dropIndex tableName="episodes" indexName="index_episodes_title"/>

    <addColumn tableName="episodes">
        <column name="is_deleted" type="bool"/>
    </addColumn>
</changeSet>
<include file="/migration/sql/yql.sql" relativeToChangelogFile="true"/>

After executing liquibase update, the database schema will be successfully updated with all of these changes:

UPDATE SUMMARY
Run:                          3
Previously run:               2
Filtered out:                 0
-------------------------------
Total change sets:            5

Liquibase: Update has been successful. Rows affected: 3
Liquibase command 'update' was executed successfully.

The result will be deleting the index, adding the is_deleted column, disabling the auto partitioning setting, and creating a topic:

_assets/liquibase-step-2.png

Initializing liquibase in a project with a non-empty data schema

Let's suppose there's an existing project with the following database schema:

_assets/liquibase-step-3.png

In this case to start using Liquibase, you need to run:

liquibase generate-changelog --changelog-file=changelog.xml

The contents of the generated changelog.xml:

<changeSet author="kurdyukov-kir (generated)" id="1711556283305-1">
    <createTable tableName="all_types_table">
        <column name="id" type="INT32">
            <constraints nullable="false" primaryKey="true"/>
        </column>
        <column name="bool_column" type="BOOL"/>
        <column name="bigint_column" type="INT64"/>
        <column name="smallint_column" type="INT16"/>
        <column name="tinyint_column" type="INT8"/>
        <column name="float_column" type="FLOAT"/>
        <column name="double_column" type="DOUBLE"/>
        <column name="decimal_column" type="DECIMAL(22, 9)"/>
        <column name="uint8_column" type="UINT8"/>
        <column name="uint16_column" type="UINT16"/>
        <column name="unit32_column" type="UINT32"/>
        <column name="unit64_column" type="UINT64"/>
        <column name="text_column" type="TEXT"/>
        <column name="binary_column" type="BYTES"/>
        <column name="json_column" type="JSON"/>
        <column name="jsondocument_column" type="JSONDOCUMENT"/>
        <column name="date_column" type="DATE"/>
        <column name="datetime_column" type="DATETIME"/>
        <column name="timestamp_column" type="TIMESTAMP"/>
        <column name="interval_column" type="INTERVAL"/>
    </createTable>
</changeSet>
<changeSet author="kurdyukov-kir (generated)" id="1711556283305-2">
    <createTable tableName="episodes">
        <column name="series_id" type="INT64">
            <constraints nullable="false" primaryKey="true"/>
        </column>
        <column name="season_id" type="INT64">
            <constraints nullable="false" primaryKey="true"/>
        </column>
        <column name="episode_id" type="INT64">
            <constraints nullable="false" primaryKey="true"/>
        </column>
        <column name="title" type="TEXT"/>
        <column name="air_date" type="DATE"/>
    </createTable>
</changeSet>
<changeSet author="kurdyukov-kir (generated)" id="1711556283305-3">
    <createIndex indexName="title_index" tableName="episodes">
        <column name="title"/>
    </createIndex>
</changeSet>

Then you need to synchronize the generated changelog.xml the file, this is done by the command:

liquibase changelog-sync --changelog-file=changelog.xml

The result will be liquibase synchronization in your project:

_assets/liquibase-step-4.png

Connecting to YDB

In the examples above, a Docker container was used, which didn't require any additional authentication settings.

List of different authentication options through URL parameters:

  • Local or remote Docker (anonymous authentication):
    jdbc:ydb:grpc://localhost:2136/local
  • Self-hosted cluster:
    jdbc:ydb:grpcs://<host>:2135/Root/testdb?secureConnectionCertificate=file:~/myca.cer
  • Connect with token to the cloud instance:
    jdbc:ydb:grpcs://<host>:2135/path/to/database?token=file:~/my_token
  • Connect with service account to the cloud instance:
    jdbc:ydb:grpcs://<host>:2135/path/to/database?saFile=file:~/sa_key.json

Also, if your cluster is configured using username and password, authentication is done through Liquibase parameters.

For more info about different authentication settings, refer to the section.