Managing a cluster in bridge mode
Feature of Yandex Enterprise Database
This functionality is available only in the Yandex Enterprise Database. In the open-source version of YDB it is absent.
Below are typical operations for a cluster in bridge mode using the corresponding YDB CLI commands.
View current state
Shows the current state of each pile configured on the YDB cluster.
ydb admin cluster bridge list
Example output:
pile-a: PRIMARY
pile-b: SYNCHRONIZED
Planned PRIMARYchange (switchover)
If planned maintenance is scheduled in the foreseeable future in the data center or on the equipment where the current PRIMARY pile is running, it is recommended to switch the cluster to use another pile as PRIMARY in advance. Select another pile in the SYNCHRONIZED state to switch it to the PRIMARY state with the following command:
ydb admin cluster bridge switchover --new-primary <pile>
The switchover is performed smoothly: roles go through PRIMARY/PROMOTED and end in the SYNCHRONIZED/PRIMARY state.
Planned pile disconnection (takedown)
If planned maintenance will make one of the pile unavailable, it must be taken out of the cluster before starting using the following command:
ydb admin cluster bridge takedown --pile <pile>
# if disconnecting the current PRIMARY:
ydb admin cluster bridge takedown --pile <current-primary> --new-primary <synchronized-pile>
When the operation is performed, the pile is transitioned to SUSPENDED, then to DISCONNECTED; further cluster operations are performed without the disconnected pile.
If disconnecting the current PRIMARY and it was not possible to change it in advance, these operations can be combined by specifying the new PRIMARY in the --new-primary argument, which must be in the SYNCHRONIZED state.
ydb admin cluster bridge takedown --pile <pile>
# if disconnecting the current PRIMARY:
ydb admin cluster bridge takedown --pile <current-primary> --new-primary <synchronized-pile>
Warning
Before starting planned maintenance, always verify using the list command that the pile disconnection operation has completed successfully and all pile are in the expected state.
Emergency disconnection of unavailable pile (failover)
Since synchronous replication operates between pile, when one of them unexpectedly fails, cluster operation stops by default, and a decision must be made whether to continue cluster operation without this pile. This decision can be made by a person (for example, an on-call DevOps engineer) or by automation external to the YDB cluster.
If the decision is to continue cluster operation, run the following command:
ydb admin cluster bridge failover --pile <unavailable-pile>
If the current PRIMARY is unavailable, you must add the --new-primary parameter with the name of a pile in the SYNCHRONIZED state. If the parameter is not specified or is specified incorrectly, the command will fail with an error without any changes to the cluster.
ydb admin cluster bridge failover --pile <unavailable-pile>
# if the current PRIMARY is unavailable:
ydb admin cluster bridge failover --pile <unavailable-primary> --new-primary <synchronized-pile>
The unavailable pile will be transitioned to the DISCONNECTED state, and when a new PRIMARY is specified, that role will be switched. If other pile are in states other than SYNCHRONIZED, emergency disconnection can also be performed. Valid transitions depend on the current state pair and are shown on the state diagram and in the transition table.
Return pile to the cluster (rejoin)
After planned maintenance is complete or the causes of the failure have been resolved, previously disconnected pile must be explicitly brought back into operation with the following command:
ydb admin cluster bridge rejoin --pile <pile>
Immediately after the operation starts, the pile transitions to the NOT_SYNCHRONIZED state and a background data synchronization process starts; when synchronization completes, the pile automatically becomes SYNCHRONIZED. After waiting for this state, you can switch the PRIMARY role to this pile if needed.