Vector index

Warning

Supported only for row-oriented tables. Support for column-oriented tables is currently under development.

Alert

The following features are not supported:

  • Index update: the main table can be modified, but the existing index will not be updated. A new index is to be built to reflect the changes. If necessary, the existing index can be atomically replaced with the newly built one.
  • Building an index for vectors with bit quantization.

These limitations may be removed in future versions.

Warning

It makes no sense to create an empty table with a vector index, because for now we don't allow mutations in tables with vector indexes.

You should use ALTER TABLE ... ADD INDEX command) to add a vector index to an existing table.

Vector index in row-oriented tables is created using the same syntax as secondary indexes, by specifying vector_kmeans_tree as the index type. Subset of syntax available for vector indexes:

CREATE TABLE `<table_name>` (
    ...
    INDEX `<index_name>`
        GLOBAL
        [SYNC]
        USING vector_kmeans_tree
        ON ( <index_columns> )
        [COVER ( <cover_columns> )]
        [WITH ( <parameter_name> = <parameter_value>[, ...])]
    [,   ...]
)

Where:

  • <index_name> - unique index name for data access
  • SYNC - indicates synchronous data writing to the index. This is the only currently available option, and it is used by default.
  • <index_columns> - comma-separated list of table columns used for index searches (the last column is used as embedding, others as filtering columns)
  • <cover_columns> - list of additional table columns stored in the index to enable retrieval without accessing the main table
  • <parameter_name> and <parameter_value> - list of key-value parameters:
  • common parameters for all vector indexes:
    • vector_dimension - embedding vector dimensionality (16384 or less)
    • vector_type - vector value type (float, uint8, int8, or bit)
    • distance - distance function (cosine, manhattan, or euclidean), mutually exclusive with similarity
      • similarity - similarity function (inner_product or cosine), mutually exclusive with distance
  • specific parameters for vector_kmeans_tree (see Vector Index Type `vector_kmeans_tree` {#kmeans-tree-type}):
    • clusters - number of centroids for k-means algorithm (values greater than 1000 may degrade performance)
    • levels - number of levels in the tree

Warning

Vector indexes with vector_type=bit are not currently supported.

Example

CREATE TABLE user_articles (
    article_id Uint64,
    user String,
    title String,
    text String,
    embedding String,
    INDEX emb_cosine_idx GLOBAL SYNC USING vector_kmeans_tree
    ON (user, embedding) COVER (title, text)
    WITH (
        distance="cosine",
        vector_type="float",
        vector_dimension=512,
        clusters=128,
        levels=2
    ),
    PRIMARY KEY (article_id)
)