Frequent tablet moves between nodes

YDB automatically balances the load by moving tablets from overloaded nodes to other nodes. This process is managed by Hive. When Hive moves tablets, queries affecting those tablets might experience increased latencies while they wait for the tablet to get initialized on the new node.

YDB considers usage of the following hardware resources for balancing nodes:

  • CPU
  • Memory
  • Network
  • Count

Autobalancing occurs in the following cases:

  • Disbalance in hardware resource usage

    YDB uses the scatter metric to evaluate the balance in hardware resource usage. This metric is calculated for each resource using the following formula:

    Scatter=MaxUsageMinUsageMaxUsage,Scatter = \frac {MaxUsage - MinUsage} {MaxUsage},

    where:

    • MaxUsageMaxUsage is the maximum hardware resource usage among all of the nodes.
    • MinUsageMinUsage is the minimum hardware resource usage among all of the nodes.

    To distribute the load, YDB considers the hardware resources available to each node. Under low loads, the scatter value may vary significantly across nodes; however, the minimum value for this formula is set to never fall below 30%.

  • Overloaded nodes (CPU and memory usage)

    Hive starts the autobalancing procesure when the highest load on a node exceeds 90%, while the lowest load on a node is below 70%.

  • Uneven distribution of database objects

    YDB uses the ObjectImbalance metric to monitor the distribution of tablets utilizing the *[counter](counter) resource across YDB nodes. When YDB nodes restart, these tablets may not distribute evenly, prompting Hive to initiate the autobalancing procedure.

Diagnostics

  1. See if the Tablets moved by Hive chart in the DB status Grafana dashboard shows any spikes.

     This chart displays the time-series data for the number of tablets moved per second.
    
  2. See the Hive balancer stats.

    1. Open Embedded UI.

    2. Click Developer UI in the upper right corner of the Embedded UI.

    3. In the Developer UI, navigate to Tablets > Hive > App.

      See the balancer stats in the upper right corner.

      cpu balancer

    4. Additionally, to see the recently moved tablets, click the Balancer button.

      The Balancer window will appear. The list of recently moved tablets is displayed in the Latest tablet moves section.

Recommendations

Adjust Hive balancer settings:

  1. Open Embedded UI.

  2. Click Developer UI in the upper right corner of the Embedded UI.

  3. In the Developer UI, navigate to Tablets > Hive > App.

  4. Click Settings.

  5. To reduce the likelihood of overly frequent balancing, increase the following Hive balancer thresholds:

    Parameter

    Description

    Default value

    MinCounterScatterToBalance

    The threshold for the counter scatter value. When this value is reached, Hive starts balancing the load.

    0.02

    MinCPUScatterToBalance

    The threshold for the CPU scatter value. When this value is reached, Hive starts balancing the load.

    0.5

    MinMemoryScatterToBalance

    The threshold for the memory scatter value. When this value is reached, Hive starts balancing the load.

    0.5

    MinNetworkScatterToBalance

    The threshold for the network scatter value. When this value is reached, Hive starts balancing the load.

    0.5

    MaxNodeUsageToKick

    The threshold for the node resource usage. When this value is reached, Hive starts emergency balancing.

    0.9

    ObjectImbalanceToBalance

    The threshold for the database object imbalance metric.

    0.02

    Note

    These parameters use relative values, where 1.0 represents 100% and effectively disables balancing. If the total hardware resource value can exceed 100%, adjust the ratio accordingly.