Deploying connectors to external data sources
Warning
This functionality is in "Experimental" mode.
Connectors are special microservices providing YDB with a universal abstraction for accessing external data sources. Connectors act as extension points for the YDB federated query processing system. This guide will discuss the specifics of deploying connectors in an on-premise environment.
fq-connector-go
The fq-connector-go
connector is implemented in Go; its source code is hosted on GitHub. It provides access to the following data sources:
The connector can be installed using a binary distribution or a Docker image.
Running from a binary distribution
Use binary distributions to install the connector on a physical or virtual Linux server without container virtualization.
-
On the releases page of the connector, select the latest release and download the archive for your platform and architecture. The following command downloads version
v0.2.4
of the connector for the Linux platform andamd64
architecture:mkdir /tmp/connector && cd /tmp/connector wget https://github.com/ydb-platform/fq-connector-go/releases/download/v0.2.4/fq-connector-go-v0.2.4-linux-amd64.tar.gz tar -xzf fq-connector-go-v0.2.4-linux-amd64.tar.gz
-
If YDB nodes have not yet been deployed on the server, create directories for storing executable and configuration files:
sudo mkdir -p /opt/ydb/bin /opt/ydb/cfg
-
Place the extracted executable and configuration files of the connector into the newly created directories:
sudo cp fq-connector-go /opt/ydb/bin sudo cp fq-connector-go.yaml /opt/ydb/cfg
-
In the recommended usage mode, the connector is deployed on the same servers as the dynamic nodes of YDB, so encryption of network connections between them is not required. However, if you need to enable encryption, prepare a pair of TLS keys and specify the paths to the public and private keys in the
connector_server.tls.cert
andconnector_server.tls.key
fields of thefq-connector-go.yaml
configuration file:connector_server: # ... tls: cert: "/opt/ydb/certs/fq-connector-go.crt" key: "/opt/ydb/certs/fq-connector-go.key"
-
If external data sources use TLS, the connector will need a root or intermediate Certificate Authority (CA) certificate that signed the sources' certificates to establish encrypted connections. Linux servers usually have some CA root certificates pre-installed. For Ubuntu OS, the list of supported CAs can be displayed with the following command:
awk -v cmd='openssl x509 -noout -subject' '/BEGIN/{close(cmd)};{print | cmd}' < /etc/ssl/certs/ca-certificates.crt
If the server lacks the required CA certificate, copy it to a special system directory and update the certificates list:
sudo cp root_ca.crt /usr/local/share/ca-certificates/ sudo update-ca-certificates
-
You can start the service manually or using
systemd
.ManuallyUsing systemdStart the service from the console with the following command:
/opt/ydb/bin/fq-connector-go server -c /opt/ydb/cfg/fq-connector-go.yaml
Along with the binary distribution, fq-connector-go includes a sample configuration file (unit) for the
systemd
initialization system. Copy the unit to the/etc/systemd/system
directory, enable, and start the service:cd /tmp/connector sudo cp fq-connector-go.service /etc/systemd/system/ sudo systemctl enable fq-connector-go.service sudo systemctl start fq-connector-go.service
If successful, the service should enter the
active (running)
state. Check it with the following command:sudo systemctl status fq-connector-go ● fq-connector-go.service - YDB FQ Connector Go Loaded: loaded (/etc/systemd/system/fq-connector-go.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2024-02-29 17:51:42 MSK; 2s ago
Service logs can be read using the command:
sudo journalctl -u fq-connector-go.service
Running in Docker
-
To run the connector, use the official Docker image. It already contains the service's configuration file. Start the service with default settings using the following command:
docker run -d \ --name=fq-connector-go \ -p 2130:2130 \ ghcr.io/ydb-platform/fq-connector-go:latest
A listening socket of the GRPC service connector will start on port 2130 of your host's public network interface. Subsequently, the YDB server must connect to this network address.
-
If configuration changes are needed, prepare the configuration file based on the sample and mount it to the container:
docker run -d \ --name=fq-connector-go \ -p 2130:2130 \ -v /path/to/config.yaml:/opt/ydb/cfg/fq-connector-go.yaml ghcr.io/ydb-platform/fq-connector-go:latest
-
In the recommended usage mode, the connector is deployed on the same servers as the dynamic nodes of YDB, so encryption of network connections between them is not required. However, if you need to enable encryption between YDB and the connector, prepare a pair of TLS keys and specify the paths to the public and private keys in the
connector_server.tls.cert
andconnector_server.tls.key
fields of the configuration file:connector_server: # ... tls: cert: "/opt/ydb/certs/fq-connector-go.crt" key: "/opt/ydb/certs/fq-connector-go.key"
When starting the container, mount the directory with the TLS key pair inside it so that they are accessible to the
fq-connector-go
process at the paths specified in the configuration file:docker run -d \ --name=fq-connector-go \ -p 2130:2130 \ -v /path/to/config.yaml:/opt/ydb/cfg/fq-connector-go.yaml -v /path/to/keys/:/opt/ydb/certs/ ghcr.io/ydb-platform/fq-connector-go:latest
-
If external data sources use TLS, the connector will need a root or intermediate Certificate Authority (CA) certificate that signed the sources' certificates to establish encrypted connections. The Docker image for the connector is based on the Alpine Linux distribution image, which already contains some CA certificates. Check for the required CA in the pre-installed list with the following command:
docker run -it --rm ghcr.io/ydb-platform/fq-connector-go sh # then in the console inside the container: apk add openssl awk -v cmd='openssl x509 -noout -subject' ' /BEGIN/{close(cmd)};{print | cmd}' < /etc/ssl/certs/ca-certificates.crt
If the source TLS keys are issued by a CA that is not included in the trusted list, add the CA certificate to the system paths of the container with the connector. For example, build a custom Docker image based on the existing one. Prepare the following
Dockerfile
:FROM ghcr.io/ydb-platform/fq-connector-go:latest USER root RUN apk --no-cache add ca-certificates openssl COPY root_ca.crt /usr/local/share/ca-certificates RUN update-ca-certificates
Place the
Dockerfile
and the CA root certificate in one folder, navigate to it, and build the image with the following command:docker build -t fq-connector-go_custom_ca .
The new
fq-connector-go_custom_ca
image can be used to deploy the service using the above commands.
Configuration
A current example of the fq-connector-go
service configuration file can be found in the repository.
Parameter | Description |
---|---|
connector_server |
Required section. Contains the settings of the main GPRC server that accesses the data. |
connector_server.endpoint.host |
Hostname or IP address on which the service's listening socket runs. |
connector_server.endpoint.port |
Port number on which the service's listening socket runs. |
connector_server.tls |
Optional section. Filled if TLS connections are required for the main GRPC service fq-connector-go . By default, the service runs without TLS. |
connector_server.tls.key |
Full path to the private encryption key. |
connector_server.tls.cert |
Full path to the public encryption key. |
logger |
Optional section. It contains logging settings. |
logger.log_level |
Logging level. Valid values: TRACE , DEBUG , INFO , WARN , ERROR , FATAL . Default value: INFO . |
logger.enable_sql_query_logging |
For data sources supporting SQL, query logging is enabled. Valid values: true , false . IMPORTANT: Enabling this option may result in printing confidential user data in the logs. Default value: false . |
paging |
Optional section. It contains settings for the algorithm of splitting the data stream extracted from the source into Arrow blocks. For each request, a queue of blocks prepared for sending to YDB is created in the connector. Arrow block allocation contributes significantly to the memory consumption of the fq-connector-go process. The minimum memory required for the connector's operation can be roughly estimated by the formula , where is the number of concurrent requests, is the paging.bytes_per_page parameter, and is the paging.prefetch_queue_capacity parameter. |
paging.bytes_per_page |
Maximum number of bytes in one block. Recommended values range from 4 to 8 MiB, and the maximum is 48 MiB. Default value: 4 MiB. |
paging.prefetch_queue_capacity |
Number of pre-read data blocks stored in the connector's address space until YDB requests the next data block. In some scenarios, larger values of this setting can increase throughput but will also lead to higher memory consumption by the process. Recommended values - at least 2. Default value: 2. |