Working with PostgreSQL Databases
This section provides basic information on working with external PostgreSQL databases.
To work with an external PostgreSQL database, you need to follow these steps:
-
Create a secret containing the password for connecting to the database.
CREATE OBJECT postgresql_datasource_user_password (TYPE SECRET) WITH (value = "<password>"); -
Create an external data source that describes a specific database within the PostgreSQL cluster. By default, the namespace
publicis used for reading, but this value can be changed using the optionalSCHEMAparameter. The network connection is made using the standard (Frontend/Backend Protocol) over TCP transport (PROTOCOL="NATIVE"). You can enable encryption of connections to the external database using theUSE_TLS="TRUE"parameter.CREATE EXTERNAL DATA SOURCE postgresql_datasource WITH ( SOURCE_TYPE="PostgreSQL", LOCATION="<host>:<port>", DATABASE_NAME="<database>", AUTH_METHOD="BASIC", LOGIN="user", PASSWORD_SECRET_NAME="postgresql_datasource_user_password", PROTOCOL="NATIVE", USE_TLS="TRUE", SCHEMA="<schema>" ); -
Deploy the connector and configure the YDB dynamic nodes to interact with it. Additionally, ensure network access from the YDB dynamic nodes to the external data source (at the address specified in the
LOCATIONparameter of theCREATE EXTERNAL DATA SOURCErequest). If network connection encryption to the external source was enabled in the previous step, the connector will use the system's root certificates. More details on TLS configuration can be found in the guide on deploying the connector. -
Execute a query to the database.
Query Syntax
The following SQL query format is used to work with PostgreSQL:
SELECT * FROM postgresql_datasource.<table_name>
where:
postgresql_datasource- identifier of the external data source;<table_name>- table name within the external data source.
Limitations
When working with PostgreSQL clusters, there are a number of limitations:
-
External sources are available only for reading data through
SELECTqueries. The federated query processing engine currently does not support queries that modify tables in external sources. -
If the date value stored in the external data source is outside the allowed range for YDB (all dates used must be later than 1970-01-01 but earlier than 2105-12-31), such a value in YDB will be converted to
NULL. -
The YDB federated query processing system is capable of delegating the execution of certain parts of a query to the system acting as the data source. Query fragments are passed through YDB directly to the external system and processed by them. This optimization, known as "predicate pushdown", significantly reduces the amount of data transferred from the source to the federated query processing engine. This reduces network load and saves computational resources for the federated YDB.
A specific case of predicate pushdown, when the filtering expressions are specified after the
WHEREkeyword, are passed to the data source, is called "filter pushdown". Filter pushdown is possible when using:Description Example NULLchecksWHERE column1 IS NULLorWHERE column1 IS NOT NULLLogical conditions OR,NOT,ANDand parentheses for controlling calculation priority.WHERE column1 IS NULL OR (column2 IS NOT NULL AND column3 > 10).Comparison operators with other columns or constants. WHERE column1 > column2 OR column3 <= 10,WHERE column1 + column2 > 10,WHERE column1 = (10 + 10)When using other types of filters, pushdown to the data source is not performed: filtering of the external table rows will be executed by the federated YDB, which means that YDB will perform a full scan of the external table when processing the query.
Supported data types for filter pushdown:
YDB Data Type BoolInt8Int16Int32Int64FloatDoubleDecimal
Supported Data Types
In the PostgreSQL database, the optionality of column values (whether a column can contain NULL values) is not part of the data type system. The NOT NULL constraint for each column is implemented as the attnotnull attribute in the system catalog pg_attribute, i.e., at the metadata level of the table. Therefore, all basic PostgreSQL types can contain NULL values by default, and in the YDB type system, they should be mapped to optional types.
Below is a correspondence table between PostgreSQL and YDB types. All other data types, except those listed, are not supported.
| PostgreSQL Data Type | YDB Data Type | Notes |
|---|---|---|
boolean |
Optional<Bool> |
|
smallint |
Optional<Int16> |
|
int2 |
Optional<Int16> |
|
integer |
Optional<Int32> |
|
int |
Optional<Int32> |
|
int4 |
Optional<Int32> |
|
serial |
Optional<Int32> |
|
serial4 |
Optional<Int32> |
|
bigint |
Optional<Int64> |
|
int8 |
Optional<Int64> |
|
bigserial |
Optional<Int64> |
|
serial8 |
Optional<Int64> |
|
real |
Optional<Float> |
|
float4 |
Optional<Float> |
|
double precision |
Optional<Double> |
|
float8 |
Optional<Double> |
|
date |
Optional<Date> |
Valid date range from 1970-01-01 to 2105-12-31. Values outside this range return NULL. |
timestamp |
Optional<Timestamp> |
Valid time range from 1970-01-01 00:00:00 to 2105-12-31 23:59:59. Values outside this range return NULL. |
bytea |
Optional<String> |
|
character |
Optional<Utf8> |
Default collation rules, string padded with spaces to the required length. |
character varying |
Optional<Utf8> |
Default collation rules. |
text |
Optional<Utf8> |
Default collation rules. |
json |
Optional<Json> |
|
numeric(p,s) |
Optional<Decimal(p,s)> |
p - total number of digits in the number, s - number of digits after the decimal point. Unconstrained numbers (numeric without parameters) are mapped into <Optional<Decimal(35,0)>>. numeric types with p > 35 or s < 0 are not supported. |