Working with ClickHouse databases

This section describes the basic information about working with the external ClickHouse database ClickHouse.

To work with the external ClickHouse database, the following steps must be completed:

  1. Create a secret containing the password to connect to the database.

    CREATE OBJECT clickhouse_datasource_user_password (TYPE SECRET) WITH (value = "<password>");
    
  2. Create an external data source describing the target database inside the ClickHouse cluster. To connect to ClickHouse, you can use either the native TCP protocol (PROTOCOL="NATIVE") or the HTTP protocol (PROTOCOL="HTTP"). To enable encryption for connections to the external database, use the USE_TLS="TRUE" parameter.

    CREATE EXTERNAL DATA SOURCE clickhouse_datasource WITH (
        SOURCE_TYPE="ClickHouse", 
        LOCATION="<host>:<port>", 
        DATABASE_NAME="<database>",
        AUTH_METHOD="BASIC",
        LOGIN="<login>",
        PASSWORD_SECRET_NAME="clickhouse_datasource_user_password",
        PROTOCOL="NATIVE",
        USE_TLS="TRUE"
    );
    
  3. Deploy the connector and configure the YDB dynamic nodes to interact with it. Additionally, ensure network access from the YDB dynamic nodes to the external data source (at the address specified in the LOCATION parameter of the CREATE EXTERNAL DATA SOURCE request). If network connection encryption to the external source was enabled in the previous step, the connector will use the system's root certificates (more details on TLS configuration can be found in the guide on deploying the connector).

  4. Execute a query to the database.

Query syntax

To work with ClickHouse, use the following SQL query form:

SELECT * FROM clickhouse_datasource.<table_name>

Where:

  • clickhouse_datasource is the identifier of the external data source;
  • <table_name> is the table's name within the external data source.

Limitations

There are several limitations when working with ClickHouse clusters:

  1. External sources are available only for reading data through SELECT queries. The federated query processing engine currently does not support queries that modify tables in external sources.

  2. If the date value stored in the external data source is outside the allowed range for YDB (all dates used must be later than 1970-01-01 but earlier than 2105-12-31), such a value in YDB will be converted to NULL.

  3. The YDB federated query processing system is capable of delegating the execution of certain parts of a query to the system acting as the data source. Query fragments are passed through YDB directly to the external system and processed within it. This optimization, known as "predicate pushdown", significantly reduces the volume of data transferred from the source to the federated query processing engine. This reduces network load and saves computational resources for YDB.

    A specific case of predicate pushdown, where filtering expressions specified after the WHERE keyword are passed down, is called "filter pushdown". Filter pushdown is possible when using:

    Description Example
    Filters like IS NULL/IS NOT NULL WHERE column1 IS NULL or WHERE column1 IS NOT NULL
    Logical conditions OR, NOT, AND WHERE column IS NULL OR column2 is NOT NULL
    Comparison conditions =, <>, <, <=, >, >= with other columns or constants WHERE column3 > column4 OR column5 <= 10

    Supported data types for filter pushdown:

    YDB Data Type
    Bool
    Int8
    Uint8
    Int16
    Uint16
    Int32
    Uint32
    Int64
    Uint64
    Float
    Double

Supported data types

By default, ClickHouse columns cannot physically contain NULL values. However, users can create tables with columns of optional or nullable types. The column types displayed in YDB when extracting data from the external ClickHouse database will depend on whether primitive or optional types are used in the ClickHouse table. Due to the previously discussed limitations of YDB types used to store dates and times, all similar ClickHouse types are displayed in YDB as optional.

Below are the mapping tables for ClickHouse and YDB types. All other data types, except those listed, are not supported.

Primitive data types

ClickHouse data type YDB data type Notes
Bool Bool
Int8 Int8
UInt8 Uint8
Int16 Int16
UInt16 Uint16
Int32 Int32
UInt32 Uint32
Int64 Int64
UInt64 Uint64
Float32 Float
Float64 Double
Date Date
Date32 Optional<Date> Valid date range from 1970-01-01 to 2105-12-31. Values outside this range return NULL.
DateTime Optional<DateTime> Valid time range from 1970-01-01 00:00:00 to 2105-12-31 23:59:59. Values outside this range return NULL.
DateTime64 Optional<Timestamp> Valid time range from 1970-01-01 00:00:00 to 2105-12-31 23:59:59. Values outside this range return NULL.
String String
FixedString String Null bytes in FixedString are transferred to String unchanged.

Optional data types

ClickHouse data type YDB data type Notes
Nullable(Bool) Optional<Bool>
Nullable(Int8) Optional<Int8>
Nullable(UInt8) Optional<Uint8>
Nullable(Int16) Optional<Int16>
Nullable(UInt16) Optional<Uint16>
Nullable(Int32) Optional<Int32>
Nullable(UInt32) Optional<Uint32>
Nullable(Int64) Optional<Int64>
Nullable(UInt64) Optional<Uint64>
Nullable(Float32) Optional<Float>
Nullable(Float64) Optional<Double>
Nullable(Date) Optional<Date>
Nullable(Date32) Optional<Date> Valid date range from 1970-01-01 to 2105-12-31. Values outside this range return NULL.
Nullable(DateTime) Optional<DateTime> Valid time range from 1970-01-01 00:00:00 to 2105-12-31 23:59:59. Values outside this range return NULL.
Nullable(DateTime64) Optional<Timestamp> Valid time range from 1970-01-01 00:00:00 to 2105-12-31 23:59:59. Values outside this range return NULL.
Nullable(String) Optional<String>
Nullable(FixedString) Optional<String> Null bytes in FixedString are transferred to String unchanged.