Log records collection in a Kubernetes cluster using FluentBit and YDB

This section presents the implementation of integration between the Kubernetes cluster log shipping tool - FluentBit, with subsequent saving for viewing or analysis in YDB.

Introduction

FluentBit is a tool that can collect text data, manipulate it (change, transform, merge) and send it to various storage facilities for further processing.

To deploy a scheme for delivering logs of running applications to Kubernetes using FluentBit and then saving them in YDB, you need to:

The work diagram looks like this:

FluentBit in Kubernetes cluster
Figure 1 — Interaction diagram between FluentBit and YDB in the Kubernetes cluster

In this diagram:

  • Application pods write logs to stdout/stderr

  • Text from stdout/stderr is saved as files on Kubernetes worker nodes

  • Pod with FluentBit

    • Mounts a folder with log files for itself

    • Reads the contents from them

    • Enriches posts with additional metadata

    • Saves records to YDB cluster

Creating a table in YDB

On the selected YDB cluster, you need to run the following query:

CREATE TABLE `fluent-bit/log` (
    `timestamp`         Timestamp NOT NULL,
    `file`              Text NOT NULL,
    `pipe`              Text NOT NULL,
    `message`           Text NULL,
    `datahash`          Uint64 NOT NULL,
    `message_parsed`    JSON NULL,
    `kubernetes`        JSON NULL,

    PRIMARY KEY (
         `timestamp`, `file`, `datahash`
    )
)

Column purpose:

  • timestamp – the log timestamp

  • file – name of the source from which the log was read. In the case of Kubernetes, this will be the name of the file on the worker node in which the logs of a specific pod are written

  • pipe – stdout or stderr stream where application-level writing was done

  • message – the log message

  • datahash – the CityHash64 hash code calculated over the log message (required to avoid overwriting messages from the same source and with the same timestamp)

  • message_parsed – a structured log message, if it could be parsed using the fluent-bit parsers

  • kubernetes – information about the pod, for example: name, namespace, logs and annotations

Optionally, you can set TTL for table rows

FluentBit configuration

It is necessary to replace the repository and image version:

image:
  repository: ghcr.io/ydb-platform/fluent-bit-ydb
  tag: latest

In this image, a plugin library has been added that implements YDB support. Source code is available here

The following lines define the rules for mounting log folders in FluentBit pods:

volumeMounts:
  - name: config
    mountPath: /fluent-bit/etc/conf

daemonSetVolumes:
  - name: varlog
    hostPath:
      path: /var/log
  - name: varlibcontainers
    hostPath:
      path: /var/lib/containerd/containers
  - name: etcmachineid
    hostPath:
      path: /etc/machine-id
      type: File

daemonSetVolumeMounts:
  - name: varlog
    mountPath: /var/log
  - name: varlibcontainers
    mountPath: /var/lib/containerd/containers
    readOnly: true
  - name: etcmachineid
    mountPath: /etc/machine-id
    readOnly: true

Also, you need to redefine the command and launch arguments:

command:
  - /fluent-bit/bin/fluent-bit

args:
  - --workdir=/fluent-bit/etc
  - --plugin=/fluent-bit/lib/out_ydb.so
  - --config=/fluent-bit/etc/conf/fluent-bit.conf

And the pipeline itself for collecting, converting and delivering logs:

config:
  inputs: |
    [INPUT]
        Name tail
        Path /var/log/containers/*.log
        multiline.parser cri
        Tag kube.*
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On

  filters: |
    [FILTER]
        Name kubernetes
        Match kube.*
        Keep_Log On
        Merge_Log On
        Merge_Log_Key log_parsed
        K8S-Logging.Parser On
        K8S-Logging.Exclude On

    [FILTER]
        Name modify
        Match kube.*
        Remove time
        Remove _p

  outputs: |
    [OUTPUT]
        Name ydb
        Match kube.*
        TablePath fluent-bit/log
        Columns {".timestamp":"timestamp",".input":"file",".hash":"datahash","log":"message","log_parsed":"message_structured","stream":"pipe","kubernetes":"metadata"}
        ConnectionURL ${OUTPUT_YDB_CONNECTION_URL}
        CredentialsToken ${OUTPUT_YDB_CREDENTIALS_TOKEN}

Blocks description:

  • Inputs. This block specifies where to read and how to parse logs. In this case, *.log files will be read from the /var/log/containers/ folder, which was mounted from the host

  • Filters. This block specifies how logs will be processed. In this case: for each log the corresponding metadata will be found (using the kubernetes filter), and unused fields (_p, time) will be cut out

  • Outputs. This block specifies where the logs will be sent. In this case, to the fluent-bit/log table in the YDB cluster. Cluster connection parameters (ConnectionURL, CredentialsToken) are set using the corresponding environment variables – OUTPUT_YDB_CONNECTION_URL, OUTPUT_YDB_CREDENTIALS_TOKEN

Environment variables are defined as follows:

env:
  - name: OUTPUT_YDB_CONNECTION_URL
    value: grpc://ydb-endpoint:2135/path/to/database
  - name: OUTPUT_YDB_CREDENTIALS_TOKEN
    valueFrom:
      secretKeyRef:
        key: token
        name: fluent-bit-ydb-plugin-token

The secret authorization token must be created in advance in the cluster. For example, using the command:

kubectl create secret -n ydb-fluent-bit-integration generic fluent-bit-ydb-plugin-token --from-literal=token=<YDB TOKEN>

FluentBit deployment

HELM is a way to package and install applications in a Kubernetes cluster. To deploy FluentBit, you need to add a chart repository using the command:

helm repo add fluent https://fluent.github.io/helm-charts

Installing FluentBit on a Kubernetes cluster is done using the following command:

helm upgrade --install fluent-bit fluent/fluent-bit \
  --version 0.37.1 \
  --namespace ydb-fluent-bit-integration \
  --create-namespace \
  --values values.yaml

Verify the installation

Check that fluent-bit has started by reading its logs (there should be no [error] level entries):

kubectl logs -n ydb-fluent-bit-integration -l app.kubernetes.io/instance=fluent-bit

Check that there are records in the YDB table (they will appear approximately a few minutes after launching FluentBit):

SELECT * FROM `fluent-bit/log` LIMIT 10 ORDER BY `timestamp` DESC

Resource cleanup

It is enough to remove the namespace with fluent-bit:

kubectl delete namespace ydb-fluent-bit-integration

And a table with logs:

DROP TABLE `fluent-bit/log`