Restarting YDB clusters deployed with Ansible

YDB clusters provide strong availability guarantees; thus, the cluster's fault tolerance model needs to be considered during any maintenance, including cluster restarts. There are two kinds of nodes that might need to be restarted:

  • Database nodes (also known as dynamic) are stateless; thus, the primary consideration is having enough of them running to handle each database's load. A basic rolling restart with a little delay is usually sufficient for dynamic nodes.
  • Storage nodes (also known as static) are stateful and responsible for safely persisting data. Thus, they require special handling to ensure data availability. Each YDB cluster has a dedicated component that keeps track of all outages and maintenance and can tell if it is currently safe to stop or restart a particular node. Thus, asking for its permission for each operation is essential, and a complete restart of storage nodes often takes a while.

Restart via Ansible playbook

ydb-ansible repository contains a playbook called ydb_platform.ydb.restart that can be used to restart a YDB cluster. Run it from the same directory used for the initial deployment.

Restart all nodes

By default, the ydb_platform.ydb.restart restarts all cluster nodes. Static nodes go first, then dynamic nodes. The command to run it:

ansible-playbook ydb_platform.ydb.restart

Filter by node type

Tasks in the ydb_platform.ydb.restart playbook are tagged with node types, so you can use Ansible's tags functionality to filter nodes by their kind.

These two commands are equivalent and will restart all storage nodes:

ansible-playbook ydb_platform.ydb.restart --tags storage
ansible-playbook ydb_platform.ydb.restart --tags static

These two commands are equivalent and will restart all database nodes:

ansible-playbook ydb_platform.ydb.restart --tags database
ansible-playbook ydb_platform.ydb.restart --tags dynamic

Filter by hostname

To restart a specific host or subset of hosts, use the --limit argument:

ansible-playbook ydb_platform.ydb.restart --limit='<hostname>'
ansible-playbook ydb_platform.ydb.restart --limit='<hostname-1,hostname-2>'

It can be used together with tags, too:

ansible-playbook ydb_platform.ydb.restart --tags database --limit='<hostname>'

Restart nodes manually

The ydbops tool properly implements various YDB cluster manipulations, including restarts. The ydb_platform.ydb.restart playbook explained above uses it behind the scenes, but it can be used manually, too.

There are more guidelines and information on how this works in the Maintenance without downtime article.