Handling errors
You need to handle errors properly when using the YDB SDK.
Errors can be divided into three categories:
-
Temporary failures (retryable). Such errors include a short-term loss of network connectivity, temporary unavailability, overload of a YDB subsystem, or a failure of YDB to respond to a query within the set timeout. If one of these errors occurs, retrying the failed query is likely to be successful after some time.
-
Errors that cannot be fixed with a retry (non-retryable). Such errors are caused by incorrectly written queries, YDB internal errors, or queries that mismatch the data schema. Retrying such queries will not resolve the issue. This situation requires developer attention.
-
Errors that can presumably be fixed with a retry after the client application response (conditionally retryable). Such errors include no response within the set timeout or an authentication request. Only idempotent operations can be fixed with a retry.
Handling retryable errors
The YDB SDK provides a built-in mechanism for handling temporary failures. By default, the SDK uses the recommended retry policy, which can be changed to meet the requirements of the client application. YDB returns status codes that let you determine whether a retry is appropriate and which interval to select.
You should retry an operation only if an error refers to a temporary failure. Do not retry invalid operations, such as inserting a row with an existing primary key value into a table or inserting data that mismatches the table schema.
It is extremely important to optimize the number of retries and the interval between them. An excessive number of retries and too short an interval between them result in excessive load. An insufficient number of retries prevents the operation from completing.
The built-in retry mechanisms in YDB SDKs use the following backoff strategies depending on the returned status code:
- Instant retry – Retries are made immediately.
- Fast exponential backoff – The initial interval is several milliseconds. For each subsequent attempt, the interval increases exponentially.
- Slow exponential backoff – The initial interval is several seconds. For each subsequent attempt, the interval increases exponentially.
When selecting an interval manually, the following strategies are usually used:
- Exponential backoff – For each subsequent attempt, the interval increases exponentially.
- Intervals in increments – For each subsequent attempt, the interval increases in certain increments.
- Constant intervals – Retries are made at the same intervals.
- Instant retry – Retries are made immediately.
- Random selection – Retries are made after a randomly selected time interval.
When selecting an interval and the number of retries, consider the YDB status codes.
Do not use endless retries, as this may result in excessive load.
Do not repeat instant retries more than once.
For code samples, see Retrying.
Status codes
When an error occurs, the YDB SDK returns an error object that includes status codes. The returned status code may come from the YDB server, gRPC transport, or the SDK itself.
Status codes within the range of 400000-400999 are YDB server codes that are identical for all YDB SDKs. Refer to Status codes from the YDB server.
Status codes within the range of 401000-401999 are SDK-specific. For more information about SDK-specific codes, refer to the corresponding SDK documentation.
For more information about gRPC status codes, see the gRPC documentation.
Logging errors
When using the SDK, we recommend logging all errors and exceptions:
- Log the number of retries made. An increase in the number of regular retries often indicates issues.
- Log all errors, including their types, termination codes, and causes.
- Log the total operation execution time, including operations that terminate after retries.