Setting up monitoring for a YDB cluster

This page provides instructions on how to set up monitoring for a YDB cluster.

YDB has multiple system health metrics. Instant metrics are available in the web interface:

http://<ydb-server-address>:<ydb-port>/counters/
  • <ydb-server-address>: YDB server address.

    For a local YDB cluster that is deployed using Quick start use the localhost address.

  • <ydb-port>: YDB port. Default value: 8765.

Linked metrics are grouped into subgroups (such as counters auth). To only view metrics for a particular subgroup, follow a URL like:

http://<ydb-server-address>:<ydb-port>/counters/counters=<servicename>/
  • <servicename>: metrics subgroup name.

For example, data about the utilization of server hardware resources is available at the URL:

http://<ydb-server-address>:<ydb-port>/counters/counters=utils

You can collect metrics using Prometheus, a popular open-source observability tool, or any other system compatible with its format. YDB metrics in Prometheus format are available at a URL in the following format:

http://<ydb-server-address>:<ydb-port>/counters/counters=<servicename>/prometheus
  • <servicename>: metrics subgroup name.

To visualize data, use any system that supports Prometheus, such as Grafana, Zabbix, or AWS CloudWatch:

grafana-actors

Setting up monitoring with Prometheus and Grafana

To set up monitoring for a YDB cluster using Prometheus and Grafana:

  1. Install Prometheus.

  2. Edit the Prometheus configuration file:

    1. In the targets section specify addresses of all servers of the YDB cluster and ports for each storage and database node that runs on the server.

      For example, for the YDB cluster that contains three servers, each server running one storage node on port 8765 and two database nodes on ports 8766 and 8767, specify nine addresses for all metrics subgroups except for the disk subgroups (for disk metrics subgroups, specify only storage node addresses):

      static_configs:
      - targets:
        - ydb-s1.example.com:8765
        - ydb-s1.example.com:8766
        - ydb-s1.example.com:8767
        - ydb-s2.example.com:8765
        - ydb-s2.example.com:8766
        - ydb-s2.example.com:8767
        - ydb-s3.example.com:8765
        - ydb-s3.example.com:8766
        - ydb-s3.example.com:8767
      

      For a local single-node YDB cluster, specify one address in the targets section:

      - targets: ["localhost:8765"]
      
    2. If necessary, in the tls_config section, specify the CA-issued certificate used to sign the other TLS certificates of the YDB cluster:

      tls_config:
          ca_file: '<ydb-ca-file>'
      
  3. Run Prometheus using the edited configuration file.

  4. Install and start Grafana.

  5. Create a data source of the prometheus type in Grafana, and attach it to the running Prometheus instance.

  6. Upload YDB dashboards to Grafana.

To upload dashboards, use the Grafana UI Import tool or run a script. Please note that the script uses basic authentication in Grafana. For other cases, modify the script.

Review the dashboard metric reference.