Cluster configuration
The cluster configuration is specified in the YAML file passed in the --yaml-config
parameter when the cluster nodes are run.
This article describes the main groups of configurable parameters in this file.
host_configs: Typical host configurations
A YDB cluster consists of multiple nodes, and one or more typical server configurations are usually used for their deployment. To avoid repeating its description for each node, there is a host_configs
section in the configuration file that lists the used configurations and assigned IDs.
Syntax
host_configs:
- host_config_id: 1
drive:
- path: <path_to_device>
type: <type>
- path: ...
- host_config_id: 2
...
The host_config_id
attribute specifies a numeric configuration ID. The drive
attribute contains a collection of descriptions of connected drives. Each description consists of two attributes:
path
: Path to the mounted block device, for example,/dev/disk/by-partlabel/ydb_disk_ssd_01
type
: Type of the device's physical media:ssd
,nvme
, orrot
(rotational - HDD)
Examples
One configuration with ID 1 and one SSD disk accessible via /dev/disk/by-partlabel/ydb_disk_ssd_01
:
host_configs:
- host_config_id: 1
drive:
- path: /dev/disk/by-partlabel/ydb_disk_ssd_01
type: SSD
Two configurations with IDs 1 (two SSD disks) and 2 (three SSD disks):
host_configs:
- host_config_id: 1
drive:
- path: /dev/disk/by-partlabel/ydb_disk_ssd_01
type: SSD
- path: /dev/disk/by-partlabel/ydb_disk_ssd_02
type: SSD
- host_config_id: 2
drive:
- path: /dev/disk/by-partlabel/ydb_disk_ssd_01
type: SSD
- path: /dev/disk/by-partlabel/ydb_disk_ssd_02
type: SSD
- path: /dev/disk/by-partlabel/ydb_disk_ssd_03
type: SSD
Kubernetes features
The YDB Kubernetes operator mounts NBS disks for Storage nodes at the path /dev/kikimr_ssd_00
. To use them, the following host_configs
configuration must be specified:
host_configs:
- host_config_id: 1
drive:
- path: /dev/kikimr_ssd_00
type: SSD
The example configuration files provided with the YDB Kubernetes operator contain this section, and it does not need to be changed.
hosts: Static cluster nodes
This group lists the static cluster nodes on which the Storage processes run and specifies their main characteristics:
- Numeric node ID
- DNS host name and port that can be used to connect to a node on the IP network
- ID of the standard host configuration
- Placement in a specific availability zone, rack
- Server inventory number (optional)
Syntax
hosts:
- host: <DNS host name>
host_config_id: <numeric ID of the standard host configuration>
port: <port> # 19001 by default
location:
unit: <string with the server serial number>
data_center: <string with the availability zone ID>
rack: <string with the rack ID>
- host: <DNS host name>
...
Examples
hosts:
- host: hostname1
host_config_id: 1
node_id: 1
port: 19001
location:
unit: '1'
data_center: '1'
rack: '1'
- host: hostname2
host_config_id: 1
node_id: 2
port: 19001
location:
unit: '1'
data_center: '1'
rack: '1'
Kubernetes features
When deploying YDB with a Kubernetes operator, the entire hosts
section is generated automatically, replacing any user-specified content in the configuration passed to the operator. All Storage nodes use host_config_id
= 1
, for which the correct configuration must be specified.
domains_config: Cluster domain
This section contains the configuration of the YDB cluster root domain, including the Blob Storage (binary object storage), State Storage, and authentication configurations.
Syntax
domains_config:
domain:
- name: <root domain name>
storage_pool_types: <Blob Storage configuration>
state_storage: <State Storage configuration>
security_config: <authentication configuration>
Blob Storage configuration
This section defines one or more types of storage pools available in the cluster for the data in the databases with the following configuration options:
- Storage pool name
- Device properties (for example, disk type)
- Data encryption (on/off)
- Fault tolerance mode
The following fault tolerance modes are available:
Mode | Description |
---|---|
none |
There is no redundancy. Applies for testing. |
block-4-2 |
Redundancy factor of 1.5, applies to single data center clusters. |
mirror-3-dc |
Redundancy factor of 3, applies to multi-data center clusters. |
mirror-3dc-3-nodes |
Redundancy factor of 3. Applies for testing. |
Syntax
storage_pool_types:
- kind: <storage pool name>
pool_config:
box_id: 1
encryption: <optional, specify 1 to encrypt data on the disk>
erasure_species: <fault tolerance mode name - none, block-4-2, or mirror-3-dc>
kind: <storage pool name - specify the same value as above>
pdisk_filter:
- property:
- type: <device type to be compared with the one specified in host_configs.drive.type>
vdisk_kind: Default
- kind: <storage pool name>
...
Each database in the cluster is assigned at least one of the available storage pools selected in the database creation operation. The names of storage pools among those assigned can be used in the DATA
attribute when defining column groups in YQL operators CREATE TABLE
/ALTER TABLE
.
State Storage configuration
State Storage is an independent in-memory storage for variable data that supports internal YDB processes. It stores data replicas on multiple assigned nodes.
State Storage usually does not need scaling for better performance, so the number of nodes in it must be kept as small as possible taking into account the required level of fault tolerance.
State Storage availability is key for a YDB cluster because it affects all databases, regardless of which storage pools they use. To ensure fault tolerance of State Storage, its nodes must be selected to guarantee a working majority in case of expected failures.
The following guidelines can be used to select State Storage nodes:
Cluster type | Min number of nodes |
Selection guidelines |
---|---|---|
Without fault tolerance | 1 | Select one random node. |
Within a single availability zone | 5 | Select five nodes in different block-4-2 storage pool failure domains to ensure that a majority of 3 working nodes (out of 5) remain when two domains fail. |
Geo-distributed | 9 | Select three nodes in different failure domains within each of the three mirror-3-dc storage pool availability zones to ensure that a majority of 5 working nodes (out of 9) remain when the availability zone + failure domain fail. |
When deploying State Storage on clusters that use multiple storage pools with a possible combination of fault tolerance modes, consider increasing the number of nodes and spreading them across different storage pools because unavailability of State Storage results in unavailability of the entire cluster.
Syntax
state_storage:
- ring:
node: <StateStorage node array>
nto_select: <number of data replicas in StateStorage>
ssid: 1
Each State Storage client (for example, DataShard tablet) uses nto_select
nodes to write copies of its data to State Storage. If State Storage consists of more than nto_select
nodes, different nodes can be used for different clients, so you must ensure that any subset of nto_select
nodes within State Storage meets the fault tolerance criteria.
Odd numbers must be used for nto_select
because using even numbers does not improve fault tolerance in comparison to the nearest smaller odd number.
Authentication configuration
The authentication mode in the YDB cluster is created in the domains_config.security_config
section.
Syntax
domains_config:
...
security_config:
enforce_user_token_requirement: Bool
...
Key | Description |
---|---|
enforce_user_token_requirement |
Require a user token.Acceptable values:
|
Examples
domains_config:
domain:
- name: Root
storage_pool_types:
- kind: ssd
pool_config:
box_id: 1
erasure_species: block-4-2
kind: ssd
pdisk_filter:
- property:
- type: SSD
vdisk_kind: Default
state_storage:
- ring:
node: [1, 2, 3, 4, 5, 6, 7, 8]
nto_select: 5
ssid: 1
domains_config:
domain:
- name: Root
storage_pool_types:
- kind: ssd
pool_config:
box_id: 1
erasure_species: block-4-2
kind: ssd
pdisk_filter:
- property:
- type: SSD
vdisk_kind: Default
state_storage:
- ring:
node: [1, 2, 3, 4, 5, 6, 7, 8]
nto_select: 5
ssid: 1
security_config:
enforce_user_token_requirement: true
domains_config:
domain:
- name: global
storage_pool_types:
- kind: ssd
pool_config:
box_id: 1
erasure_species: mirror-3-dc
kind: ssd
pdisk_filter:
- property:
- type: SSD
vdisk_kind: Default
state_storage:
- ring:
node: [1, 2, 3, 4, 5, 6, 7, 8, 9]
nto_select: 9
ssid: 1
domains_config:
domain:
- name: Root
storage_pool_types:
- kind: ssd
pool_config:
box_id: 1
erasure_species: none
kind: ssd
pdisk_filter:
- property:
- type: SSD
vdisk_kind: Default
state_storage:
- ring:
node:
- 1
nto_select: 1
ssid: 1
domains_config:
domain:
- name: Root
storage_pool_types:
- kind: ssd
pool_config:
box_id: '1'
erasure_species: block-4-2
kind: ssd
pdisk_filter:
- property:
- {type: SSD}
vdisk_kind: Default
- kind: rot
pool_config:
box_id: '1'
erasure_species: block-4-2
kind: rot
pdisk_filter:
- property:
- {type: ROT}
vdisk_kind: Default
- kind: rotencrypted
pool_config:
box_id: '1'
encryption_mode: 1
erasure_species: block-4-2
kind: rotencrypted
pdisk_filter:
- property:
- {type: ROT}
vdisk_kind: Default
- kind: ssdencrypted
pool_config:
box_id: '1'
encryption_mode: 1
erasure_species: block-4-2
kind: ssdencrypted
pdisk_filter:
- property:
- {type: SSD}
vdisk_kind: Default
state_storage:
- ring:
node: [1, 16, 31, 46, 61, 76, 91, 106]
nto_select: 5
ssid: 1
Actor system
The CPU resources are mainly used by the actor system. Depending on the type, all actors run in one of the pools (the name
parameter). Configuring is allocating a node's CPU cores across the actor system pools. When allocating them, please keep in mind that PDisks and the gRPC API run outside the actor system and require separate resources.
You can set up your actor system either automatically or manually. In the actor_system_config
section, specify:
- Node type and the number of CPU cores allocated to the ydbd process by automatic configuring.
- Number of CPU cores for each YDB cluster subsystem in the case of manual configuring.
Automatic configuring adapts to the current system workload. It is recommended in most cases.
You might opt for manual configuring when a certain pool in your actor system is overwhelmed and undermines the overall database performance. You can track the workload on your pools on the Embedded UI monitoring page.
Automatic configuring
Example of the actor_system_config
section for automatic configuring of the actor system:
actor_system_config:
use_auto_config: true
node_type: STORAGE
cpu_count: 10
Parameter | Description |
---|---|
use_auto_config |
Enabling automatic configuring of the actor system. |
node_type |
Node type. Determines the expected workload and vCPU ratio between the pools. Possible values:
|
cpu_count |
Number of vCPUs allocated to the node. |
Manual configuring
Example of the actor_system_config
section for manual configuring of the actor system:
actor_system_config:
executor:
- name: System
spin_threshold: 0
threads: 2
type: BASIC
- name: User
spin_threshold: 0
threads: 3
type: BASIC
- name: Batch
spin_threshold: 0
threads: 2
type: BASIC
- name: IO
threads: 1
time_per_mailbox_micro_secs: 100
type: IO
- name: IC
spin_threshold: 10
threads: 1
time_per_mailbox_micro_secs: 100
type: BASIC
scheduler:
progress_threshold: 10000
resolution: 256
spin_threshold: 0
Parameter | Description |
---|---|
executor |
Pool configuration. You should only change the number of CPU cores (the threads parameter) in the pool configs. |
name |
Pool name that indicates its purpose. Possible values:
|
spin_threshold |
The number of CPU cycles before going to sleep if there are no messages. In sleep mode, there is less power consumption, but it may increase request latency under low loads. |
threads |
The number of CPU cores allocated per pool. Make sure the total number of cores assigned to the System, User, Batch, and IC pools does not exceed the number of available system cores. |
max_threads |
Maximum vCPU that can be allocated to the pool from idle cores of other pools. When you set this parameter, the system enables the mechanism of expanding the pool at full utilization, provided that idle vCPUs are available. The system checks the current utilization and reallocates vCPUs once per second. |
max_avg_ping_deviation |
Additional condition to expand the pool's vCPU. When more than 90% of vCPUs allocated to the pool are utilized, you need to worsen SelfPing by more than max_avg_ping_deviation microseconds from 10 milliseconds expected. |
time_per_mailbox_micro_secs |
The number of messages per actor to be handled before switching to a different actor. |
type |
Pool type. Possible values:
|
scheduler |
Scheduler configuration. The actor system scheduler is responsible for the delivery of deferred messages exchanged by actors. We do not recommend changing the default scheduler parameters. |
progress_threshold |
The actor system supports requesting message sending scheduled for a later point in time. The system might fail to send all scheduled messages at some point. In this case, it starts sending them in "virtual time" by handling message sending in each loop over a period that doesn't exceed the progress_threshold value in microseconds and shifting the virtual time by the progress_threshold value until it reaches real time. |
resolution |
When making a schedule for sending messages, discrete time slots are used. The slot duration is set by the resolution parameter in microseconds. |
blob_storage_config: Static cluster group
Specify a static cluster group's configuration. A static group is necessary for the operation of the basic cluster tablets, including Hive
, SchemeShard
, and BlobstorageContoller
.
As a rule, these tablets do not store a lot of data, so we don't recommend creating more than one static group.
For a static group, specify the disks and nodes that the static group will be placed on. For example, a configuration for the erasure: none
model can be as follows:
blob_storage_config:
service_set:
groups:
- erasure_species: none
rings:
- fail_domains:
- vdisk_locations:
- node_id: 1
path: /dev/disk/by-partlabel/ydb_disk_ssd_02
pdisk_category: SSD
....
For a configuration located in 3 availability zones, specify 3 rings. For a configuration within a single availability zone, specify exactly one ring.
Sample cluster configurations
You can find model cluster configurations for deployment in the repository. Check them out before deploying a cluster.