Frequent tablet moves between nodes
YDB automatically balances the load by moving tablets from overloaded nodes to other nodes. This process is managed by Hive. When Hive moves tablets, queries affecting those tablets might experience increased latencies while they wait for the tablet to get initialized on the new node.
YDB considers usage of the following hardware resources for balancing nodes:
- CPU
- Memory
- Network
- Count
Autobalancing occurs in the following cases:
-
Disbalance in hardware resource usage
YDB uses the scatter metric to evaluate the balance in hardware resource usage. This metric is calculated for each resource using the following formula:
where:
- is the maximum hardware resource usage among all of the nodes.
- is the minimum hardware resource usage among all of the nodes.
To distribute the load, YDB considers the hardware resources available to each node. Under low loads, the scatter value may vary significantly across nodes; however, the minimum value for this formula is set to never fall below 30%.
-
Overloaded nodes (CPU and memory usage)
Hive starts the autobalancing procesure when the highest load on a node exceeds 90%, while the lowest load on a node is below 70%.
-
Uneven distribution of database objects
YDB uses the ObjectImbalance metric to monitor the distribution of tablets utilizing the *[counter](counter) resource across YDB nodes. When YDB nodes restart, these tablets may not distribute evenly, prompting Hive to initiate the autobalancing procedure.
Diagnostics
-
See if the Tablets moved by Hive chart in the DB status Grafana dashboard shows any spikes.
This chart displays the time-series data for the number of tablets moved per second.
-
See the Hive balancer stats.
-
Open Embedded UI.
-
Click Developer UI in the upper right corner of the Embedded UI.
-
In the Developer UI, navigate to Tablets > Hive > App.
See the balancer stats in the upper right corner.
-
Additionally, to see the recently moved tablets, click the Balancer button.
The Balancer window will appear. The list of recently moved tablets is displayed in the Latest tablet moves section.
-
Recommendations
Adjust Hive balancer settings:
-
Open Embedded UI.
-
Click Developer UI in the upper right corner of the Embedded UI.
-
In the Developer UI, navigate to Tablets > Hive > App.
-
Click Settings.
-
To reduce the likelihood of overly frequent balancing, increase the following Hive balancer thresholds:
Parameter
Description
Default value
MinCounterScatterToBalance
The threshold for the counter scatter value. When this value is reached, Hive starts balancing the load.
0.02
MinCPUScatterToBalance
The threshold for the CPU scatter value. When this value is reached, Hive starts balancing the load.
0.5
MinMemoryScatterToBalance
The threshold for the memory scatter value. When this value is reached, Hive starts balancing the load.
0.5
MinNetworkScatterToBalance
The threshold for the network scatter value. When this value is reached, Hive starts balancing the load.
0.5
MaxNodeUsageToKick
The threshold for the node resource usage. When this value is reached, Hive starts emergency balancing.
0.9
ObjectImbalanceToBalance
The threshold for the database object imbalance metric.
0.02
Note
These parameters use relative values, where 1.0 represents 100% and effectively disables balancing. If the total hardware resource value can exceed 100%, adjust the ratio accordingly.
Count is a virtual resource for distributing tablets of the same type evenly between nodes.