Retrying

Note

The article is being updated.

YDB is a distributed database management system with automatic load scaling.
Routine maintenance can be carried out on the server side, with server racks or entire data centers temporarily shut down.
This may result in errors arising from YDB operation.
There are different response scenarios depending on the error type.
YDB
To ensure high database availability, SDKs provide built-in tools for retries,
accounting for error types and responses to them.

Below are code examples showing the YDB SDK built-in tools for retries:

In the YDB Go SDK, correct error handling is implemented by several programming interfaces:

General-purpose repeat function

The basic logic of error handling is implemented by the helper retry.Retry function
The details of repeat query execution are mostly hidden.
The user can affect the logic of the retry.Retry function in two ways:

  • Via the context (where you can set the deadline and cancel)
  • Via the operation's idempotency flag retry.WithIdempotent(). By default, the operation is considered non-idempotent.

The user passes a custom function to retry.Retry that returns an error by its signature.
If the custom function returns nil, then repeat queries stop.
If the custom function returns an error, the YDB Go SDK tries to identify this error and executes retries depending on it.

Example of the code that uses the retry.Retry function:

package main

import (
    "context"
    "time"

    "github.com/ydb-platform/ydb-go-sdk/v3"
    "github.com/ydb-platform/ydb-go-sdk/v3/retry"
)

func main() {
    db, err := ydb.Open(ctx,
        os.Getenv("YDB_CONNECTION_STRING"),
    )
    if err != nil {
        panic(err)
    }
    defer db.Close(ctx)
    var cancel context.CancelFunc
    // fix deadline for retries
    ctx, cancel := context.WithTimeout(ctx, time.Second)
    err = retry.Retry(
        ctx,
        func(ctx context.Context) error {
            whoAmI, err := db.Discovery().WhoAmI(ctx)
            if err != nil {
                return err
            }
            fmt.Println(whoAmI)
        },
        retry.WithIdempotent(true),
    )
    if err != nil {
        panic(err)
    }
}
Repeat attempts in case of failed YDB session objects

For repeat error handling at the level of a YDB table service session, you can use the db.Table().Do(ctx, op) function, which provides a prepared session for query execution.
db.Table().Do(ctx, op) uses the retry package and tracks the lifetime of the YDB sessions.
Based on its signature, the user's operation op should return an error or nil so that the driver can "decide" what to do based on the error type: repeat the operation or not, with delay or without, and in this session or a new one.
The user can affect the logic of repeat queries using the context and the idempotence flag, while the YDB Go SDK interprets errors returned by op.

Example of the code that uses the db.Table().Do(ctx, op) function:

err := db.Table().Do(ctx, func(ctx context.Context, s table.Session) (err error) {
    desc, err = s.DescribeTableOptions(ctx)
    return
}, table.WithIdempotent())
if err != nil {
    return err
}
Repeat attempts in case of failed YDB interactive transaction objects

For repeat error handling at the level of a YDB table service interactive transaction, you can use the db.Table().DoTx(ctx, txOp) function, which provides a YDB prepared session transaction for query execution.
db.Table().DoTx(ctx, txOp) uses the retry package and tracks the lifetime of the YDB sessions.
Based on its signature, the user's operation txOp should return an error or nil so that the driver can "decide" what to do based on the error type: repeat the operation or not, with delay or without, and in this transaction or a new one.
The user can affect the logic of repeat queries using the context and the idempotence flag, while the YDB Go SDK interprets errors returned by op.

Example of the code that uses the db.Table().DoTx(ctx, op) function:

err := db.Table().DoTx(ctx, func(ctx context.Context, tx table.TransactionActor) error {
    _, err := tx.Execute(ctx,
        "DECLARE $id AS Int64; INSERT INTO test (id, val) VALUES($id, 'asd')",
        table.NewQueryParameters(table.ValueParam("$id", types.Int64Value(100500))),
    )
    return err
}, table.WithIdempotent())
if err != nil {
    return err
}
Queries to other YDB services

(db.Scripting(), db.Scheme(), db.Coordination(), db.Ratelimiter(), db.Discovery()) also use the retry.Retry function inside to execute repeat queries and don't require external auxiliary functions for repeats.

The standard database/sql package uses the internal logic of repeats based on the errors a specific driver implementation returns.
For example, the database/sql code frequently shows the three-attempt repeats policy:

  • Two attempts at a present connection or new one (if the database/sql connection pool is empty).
  • One attempt at a new connection.

This repeat policy is mostly enough to survive temporary unavailability of YDB nodes or issues with a YDB session.

The YDB Go SDK provides special functions to ensure execution of a user's operation:

Repeat attempts in case of failed *sql.Conn connection objects:

For repeat error handling at *sql.Conn connection objects, you can use the auxiliary retry.Do(ctx, db, op) function, which provides a prepared *sql.Conn session for query execution.
You need to pass the context, database object, and the user's operation for execution to the retry.Do function.
The user's code can affect the logic of repeat queries using the context and the idempotence flag, while the YDB Go SDK, in turn, interprets errors returned by op.

The user's op operation must return an error or nil:

  • If the custom function returns nil, then repeat queries stop.
  • If the custom function returns an error, the YDB Go SDK tries to identify this error and performs retries depending on it.

Example of the code that uses the retry.Do function:

import (
    "context"
    "database/sql"
    "fmt"
    "log"

    "github.com/ydb-platform/ydb-go-sdk/v3/retry"
)

func main() {
    ...
    err = retry.Do(ctx, db, func(ctx context.Context, cc *sql.Conn) (err error) {
        row = cc.QueryRowContext(ctx, `
                PRAGMA TablePathPrefix("/local");
                DECLARE $seriesID AS Uint64;
                DECLARE $seasonID AS Uint64;
                DECLARE $episodeID AS Uint64;
                SELECT views FROM episodes WHERE series_id = $seriesID AND season_id = $seasonID AND episode_id = $episodeID;
            `,
            sql.Named("seriesID", uint64(1)),
            sql.Named("seasonID", uint64(1)),
            sql.Named("episodeID", uint64(1)),
        )
        var views sql.NullFloat64
        if err = row.Scan(&views); err != nil {
            return fmt.Errorf("cannot scan views: %w", err)
        }
        if views.Valid {
            return fmt.Errorf("unexpected valid views: %v", views.Float64)
        }
        log.Printf("views = %v", views)
        return row.Err()
    }, retry.WithDoRetryOptions(retry.WithIdempotent(true)))
    if err != nil {
        log.Printf("retry.Do failed: %v\n", err)
    }
}
Repeat attempts in case of failed *sql.Tx interactive transaction objects:

For repeat error handling at *sql.Tx interactive transaction objects, you can use the auxiliary retry.DoTx(ctx, db, op) function, which provides a prepared *sql.Tx transaction for query execution.
You need to pass the context, database object, and the user's operation for execution to the retry.DoTx function.
The function is passed a prepared *sql.Tx transaction, where queries to YDB should be executed.
The user's code can affect the logic of repeat queries using the context and the operation idempotence flag, while the YDB Go SDK, in turn, interprets errors returned by op.

The user's op operation must return an error or nil:

  • If the custom function returns nil, then repeat queries stop.
  • If the custom function returns an error, the YDB Go SDK tries to identify this error and performs retries depending on it.

By default, retry.DoTx uses the read-write isolation mode of the sql.LevelDefault transaction and you can change it using the retry.WithTxOptions parameter.

Example of the code that uses the retry.Do function:

import (
    "context"
    "database/sql"
    "fmt"
    "log"

    "github.com/ydb-platform/ydb-go-sdk/v3/retry"
)

func main() {
    ...
    err = retry.DoTx(ctx, db, func(ctx context.Context, tx *sql.Tx) error {
        row := tx.QueryRowContext(ctx,`
                PRAGMA TablePathPrefix("/local");
                DECLARE $seriesID AS Uint64;
                DECLARE $seasonID AS Uint64;
                DECLARE $episodeID AS Uint64;
                SELECT views FROM episodes WHERE series_id = $seriesID AND season_id = $seasonID AND episode_id = $episodeID;
            `,
            sql.Named("seriesID", uint64(1)),
            sql.Named("seasonID", uint64(1)),
            sql.Named("episodeID", uint64(1)),
        )
        var views sql.NullFloat64
        if err = row.Scan(&views); err != nil {
            return fmt.Errorf("cannot select current views: %w", err)
        }
        if !views.Valid {
            return fmt.Errorf("unexpected invalid views: %v", views)
        }
        t.Logf("views = %v", views)
        if views.Float64 != 1 {
            return fmt.Errorf("unexpected views value: %v", views)
        }
        return nil
    }, retry.WithDoTxRetryOptions(retry.WithIdempotent(true)), retry.WithTxOptions(&sql.TxOptions{
        Isolation: sql.LevelSnapshot,
        ReadOnly:  true,
    }))
    if err != nil {
        log.Printf("do tx failed: %v\n", err)
    }
}

In the YDB Java SDK, repeat queries are implemented by the com.yandex.ydb.table.SessionRetryContext helper class. This class is constructed with the SessionRetryContext.create method to which you pass the SessionSupplier interface implementation (usually an instance of the TableClient class).
Additionally, the user can specify some other options

  • maxRetries(int maxRetries): The maximum number of operation retries, not counting the first execution. Default value: 10
  • retryNotFound(boolean retryNotFound): The option to retry operations that returned the NOT_FOUND status. Enabled by default.
  • idempotent(boolean idempotent): Indicates idempotency of operations. Idempotent operations will be retried for a broader range of errors. Disabled by default.

The SessionRetryContext class provides two methods to run operations with retries.

  • CompletableFuture<Status> supplyStatus: Executing the operation that returns the status. As an argument, it accepts the lambda Function<Session, CompletableFuture<Status>> fn
  • CompletableFuture<Result<T>> supplyResult: Executing the operation that returns data. As an argument, it accepts the lambda Function<Session, CompletableFuture<Result<T>>> fn

When using the SessionRetryContext class, make sure that the operation will be retried in the following cases

  • The lambda function returned a retryable error code

  • The lambda function invoked an UnexpectedResultException with a retryable error code

    Sample code using SessionRetryContext.supplyStatus:
    private void createTable(TableClient tableClient, String database, String tableName) {
        SessionRetryContext retryCtx = SessionRetryContext.create(tableClient).build();
        TableDescription pets = TableDescription.newBuilder()
                .addNullableColumn("species", PrimitiveType.utf8())
                .addNullableColumn("name", PrimitiveType.utf8())
                .addNullableColumn("color", PrimitiveType.utf8())
                .addNullableColumn("price", PrimitiveType.float32())
                .setPrimaryKeys("species", "name")
                .build();
    
        String tablePath = database + "/" + tableName;
        retryCtx.supplyStatus(session -> session.createTable(tablePath, pets))
                .join().expect("ok");
    }
    
    Sample code using SessionRetryContext.supplyResult:
    private void selectData(TableClient tableClient, String tableName) {
        SessionRetryContext retryCtx = SessionRetryContext.create(tableClient).build();
        String selectQuery
                = "DECLARE $species AS Utf8;"
                + "DECLARE $name AS Utf8;"
                + "SELECT * FROM " + tableName + " "
                + "WHERE species = $species AND name = $name;";
    
        Params params = Params.of(
                "$species", PrimitiveValue.utf8("cat"),
                "$name", PrimitiveValue.utf8("Tom")
        );
    
        DataQueryResult data = retryCtx
                .supplyResult(session -> session.executeDataQuery(selectQuery, TxControl.onlineRo(), params))
                .join().expect("ok");
    
        ResultSetReader rsReader = data.getResultSet(0);
        logger.info("Result of select query:");
        while (rsReader.next()) {
            logger.info("  species: {}, name: {}, color: {}, price: {}",
                    rsReader.getColumn("species").getUtf8(),
                    rsReader.getColumn("name").getUtf8(),
                    rsReader.getColumn("color").getUtf8(),
                    rsReader.getColumn("price").getFloat32()
            );
        }
    }