Replacing Node FQDN
This procedure describes how to replace the FQDN (Fully Qualified Domain Name) of a YDB cluster node without downtime.
Prerequisites
Note
A YDB cluster is fault-tolerant. Temporary node shutdown does not lead to cluster unavailability. For more details, see YDB cluster topology.
Warning
Incorrect sequence of actions or configuration errors can lead to YDB cluster unavailability.
Procedure Overview
The FQDN replacement process involves:
- Preparation: Verify cluster health and prepare a new node configuration
- Node shutdown: Gracefully stop the node to be replaced
- Configuration update: Update the cluster configuration with a new FQDN
- Node restart: Start the node with new FQDN
- Verification: Confirm successful FQDN change
Step-by-Step Instructions
Step 1: Verify Cluster Health
Before starting the replacement, ensure the cluster is healthy:
ydb monitoring healthcheck
Step 2: Prepare New Node Configuration
- Update DNS records to point the new FQDN to the same IP address
- Update TLS certificates if they include hostname verification
- Prepare updated configuration files with the new FQDN
Step 3: Stop the Target Node
Gracefully stop the node that needs FQDN replacement:
# For systemd-managed nodes
sudo systemctl stop ydbd-storage
# For manually started nodes
kill -TERM <ydbd_pid>
Step 4: Update Cluster Configuration
Update the cluster configuration to reflect the new FQDN:
# Example configuration update
hosts:
- host: new-hostname.example.com # Updated FQDN
host_config_id: 1
port: 19001
location:
unit: "1"
data_center: "DC1"
rack: "1"
Step 5: Apply Configuration Changes
Apply the updated configuration to the cluster:
ydb admin config replace --config-file updated-config.yaml
Step 6: Start Node with New FQDN
Start the node using the new FQDN:
# Update hostname if necessary
sudo hostnamectl set-hostname new-hostname.example.com
# Start the node
sudo systemctl start ydbd-storage
Step 7: Verify the Change
Confirm the FQDN change was successful:
# Check node status
ydb monitoring healthcheck
# Verify node registration
ydb admin config fetch | grep new-hostname
Troubleshooting
Common Issues
- DNS resolution problems: Ensure new FQDN resolves correctly
- Certificate validation errors: Update certificates if they include hostname verification
- Node registration failures: Check network connectivity and firewall rules
Recovery Procedures
If the FQDN replacement fails:
- Revert DNS changes to the original FQDN
- Restore the original configuration
- Restart the node with the original settings
- Investigate and resolve the underlying issue
Best Practices
- Test in staging: Always test FQDN replacement in a non-production environment first
- Backup configurations: Keep backups of working configurations before making changes
- Monitor during change: Watch cluster health metrics during the replacement process
- Document changes: Maintain records of FQDN changes for future reference
- Coordinate with the team: Ensure all team members are aware of the planned change