Overloaded errors
YDB returns OVERLOADED
errors in the following cases:
-
Overloaded table partitions with over 15000 queries in their queue.
-
The outbound CDC queue exceeds the limit of 10000 elements or 125 MB.
-
Table partitions in states other than normal, for example partitions in the process of splitting or merging.
-
The number of sessions with a YDB node has reached the limit of 1000.
Diagnostics
-
Open the DB overview Grafana dashboard.
-
In the API details section, see if the Soft errors (retriable) chart shows any spikes in the rate of queries with the
OVERLOADED
status. -
To check if the spikes in overloaded errors were caused by exceeding the limit of 15000 queries in table partition queues:
-
In the Embedded UI, go to the Databases tab and click on the database.
-
On the Navigation tab, ensure the required database is selected.
-
Open the Diagnostics tab.
-
Open the Top shards tab.
-
In the Immediate and Historical tabs, sort the shards by the InFlightTxCount column and see if the top values reach the 15000 limit.
-
-
To check if the spikes in overloaded errors were caused by tablet splits and merges, see Excessive tablet splits and merges.
-
To check if the spikes in overloaded errors were caused by exceeding the 1000 limit of open sessions, in the Grafana DB status dashboard, see the Session count by host chart.
-
See the overloaded shards issue.
Recommendations
If a YQL query returns an OVERLOADED
error, retry the query using a randomized exponential back-off strategy. The YDB SDK provides a built-in mechanism for handling temporary failures. For more information, see Handling errors.
Exceeding the limit of open sessions per node may indicate a problem in the application logic.