Vector index

Warning

Supported only for row-oriented tables. Support for column-oriented tables is currently under development.

Alert

The functionality of vector indexes is available in the test mode in main. This functionality will be fully available in version 25.1.

The following features are not supported:

  • Index update: the main table can be modified, but the existing index will not be updated. A new index is to be built to reflect the changes. If necessary, the existing index can be atomically replaced with the newly built one.
  • Building an index for vectors with bit quantization.

These limitations may be removed in future versions.

Warning

It makes no sense to create an empty table with a vector index, because for now we don't allow mutations in tables with vector indexes.

You should use ALTER TABLE ... ADD INDEX command) to add a vector index to an existing table.

Vector index in row-oriented tables is created using the same syntax as secondary indexes, by specifying vector_kmeans_tree as the index type. Subset of syntax available for vector indexes:

CREATE TABLE `<table_name>` (
    ...
    INDEX `<index_name>`
        GLOBAL
        [SYNC]
        USING vector_kmeans_tree
        ON ( <index_columns> )
        [COVER ( <cover_columns> )]
        [WITH ( <parameter_name> = <parameter_value>[, ...])]
    [,   ...]
)

Where:

  • <index_name> - unique index name for data access
  • SYNC - indicates synchronous data writing to the index. This is the only currently available option, and it is used by default.
  • <index_columns> - comma-separated list of table columns used for index searches (the last column is used as embedding, others as filtering columns)
  • <cover_columns> - list of additional table columns stored in the index to enable retrieval without accessing the main table
  • <parameter_name> and <parameter_value> - list of key-value parameters:
  • common parameters for all vector indexes:
    • vector_dimension - embedding vector dimensionality (16384 or less)
    • vector_type - vector value type (float, uint8, int8, or bit)
    • distance - distance function (cosine, manhattan, or euclidean), mutually exclusive with similarity
      • similarity - similarity function (inner_product or cosine), mutually exclusive with distance
  • specific parameters for vector_kmeans_tree (see Vector Index Type `vector_kmeans_tree` {#kmeans-tree-type}):
    • clusters - number of centroids for k-means algorithm (values greater than 1000 may degrade performance)
    • levels - number of levels in the tree

Warning

Vector indexes with vector_type=bit are not currently supported.

Example

CREATE TABLE user_articles (
    article_id Uint64,
    user String,
    title String,
    text String,
    embedding String,
    INDEX emb_cosine_idx GLOBAL SYNC USING vector_kmeans_tree
    ON (user, embedding) COVER (title, text)
    WITH (
        distance="cosine",
        vector_type="float",
        vector_dimension=512,
        clusters=128,
        levels=2
    ),
    PRIMARY KEY (article_id)
)