Overloaded shards

Data shards serving row-oriented tables may become overloaded for the following reasons:

  • A table is created without the AUTO_PARTITIONING_BY_LOAD clause.

    In this case, YDB does not split overloaded shards.

    Data shards are single-threaded and process queries sequentially. Each data shard can accept up to 10,000 operations. Accepted queries wait for their turn to be executed. So the longer the queue, the higher the latency.

    If a data shard already has 10000 operations in its queue, new queries will return an "overloaded" error. Retry such queries using a randomized exponential back-off strategy. For more information, see Overloaded errors.

  • A table was created with the AUTO_PARTITIONING_MAX_PARTITIONS_COUNT setting and has already reached its partition limit.

  • An inefficient primary key that causes an imbalance in the distribution of queries across shards. A typical example is ingestion with a monotonically increasing primary key, which may lead to the overloaded "last" partition. For example, this could occur with an autoincrementing primary key using the serial data type.

Diagnostics

  1. Use the Embedded UI or Grafana to see if the YDB nodes are overloaded:

    • In the DB overview Grafana dashboard, analyze the Overloaded shard count chart.

      The chart indicates whether the YDB cluster has overloaded shards, but it does not specify which table's shards are overloaded.

      Tip

      Use Grafana to set up alert notifications when YDB data shards get overloaded.

    • In the Embedded UI:

      1. Go to the Databases tab and click on the database.

      2. On the Navigation tab, ensure the required database is selected.

      3. Open the Diagnostics tab.

      4. Open the Top shards tab.

      5. In the Immediate and Historical tabs, sort the shards by the CPUCores column and analyze the information.

      Additionally, the information about overloaded shards is provided as a system table. For more information, see History of overloaded partitions.

  2. To pinpoint the schema issue, use the Embedded UI or YDB CLI:

    • In the Embedded UI:

      1. On the Databases tab, click on the database.

      2. On the Navigation tab, select the required table.

      3. Open the Diagnostics tab.

      4. On the Describe tab, navigate to root > PathDescription > Table > PartitionConfig > PartitioningPolicy.

        Describe

      5. Analyze the PartitioningPolicy values:

        • SizeToSplit
        • SplitByLoadSettings
        • MaxPartitionsCount

        If the table does not have these options, see Recommendations for table configuration.

      Note

      You can also find this information on the Diagnostics > Info tab.

    • In the YDB CLI:

      1. To retrieve information about the problematic table, run the following command:

        ydb scheme describe <table_name>
        
      2. In the command output, analyze the Auto partitioning settings:

        • Partitioning by size
        • Partitioning by load
        • Max partitions count

        If the table does not have these options, see Recommendations for table configuration.

  3. Analyze whether primary key values increment monotonically:

    • Check the data type of the primary key column. Serial data types are used for autoincrementing values.

    • Check the application logic.

    • Calculate the difference between the minimum and maximum values of the primary key column. Then compare this value to the number of rows in a given table. If these values match, the primary key might be incrementing monotonically.

    If primary key values do increase monotonically, see Recommendations for the imbalanced primary key.

Recommendations

For table configuration

Consider the following solutions to address shard overload:

  • If the problematic table is not partitioned by load, enable partitioning by load.

    Tip

    A table is not partitioned by load, if you see the Partitioning by load: false line on the Diagnostics > Info tab in the Embedded UI or the ydb scheme describe command output.

  • If the table has reached the maximum number of partitions, increase the partition limit.

    Tip

    To determine the number of partitions in the table, see the PartCount value on the Diagnostics > Info tab in the Embedded UI.

Both operations can be performed by executing an ALTER TABLE ... SET query.

For the imbalanced primary key

Consider modifying the primary key to distribute the load evenly across table partitions. You cannot change the primary key of an existing table. To do that, you will have to create a new table with the modified primary key and then migrate the data to the new table.

Note

Also, consider changing your application logic for generating primary key values for new rows. For example, use hashes of values instead of values themselves.