Reading data from an external table pointing to S3 (Object Storage)

Sometimes, the same data queries need to be executed regularly. To avoid specifying all the details of working with this data every time a query is called, use the mode with external tables. In this case, the query looks like a regular query to YDB tables.

Example query for reading data:

SELECT
    *
FROM
    `s3_test_data`
WHERE
    version > 1

Creating an external table pointing to an S3 bucket (Object Storage)

To create an external table describing the S3 bucket (Object Storage), execute the following SQL query. The query creates an external table named s3_test_data, containing files in the CSV format with string fields key and value, located inside the bucket at the path test_folder, using the connection credentials specified by the external data source object bucket:

CREATE EXTERNAL TABLE `s3_test_data` (
  key Utf8 NOT NULL,
  value Utf8 NOT NULL
) WITH (
  DATA_SOURCE="bucket",
  LOCATION="folder",
  FORMAT="csv_with_names",
  COMPRESSION="gzip"
);

Where:

Data model

Reading data using external tables from S3 (Object Storage) is done with regular SQL queries as if querying a normal table.

SELECT
    <expression>
FROM
    `s3_test_data`
WHERE
    <filter>;

Limitations

There are a number of limitations when working with S3 buckets (Object Storage).

Limitations:

  1. Only data read requests - SELECT and INSERT are supported; other requests are not.
  2. If the date value stored in the external data source is outside the allowed range for YDB (all dates used must be later than 1970-01-01 but earlier than 2105-12-31), such a value in YDB will be converted to NULL.