Vector index

Vector index in row-oriented tables is created using the same syntax as secondary indexes, by specifying vector_kmeans_tree as the index type. Subset of syntax available for vector indexes:

CREATE TABLE `<table_name>` (
    ...
    INDEX `<index_name>`
        GLOBAL
        [SYNC]
        USING vector_kmeans_tree
        ON ( <index_columns> )
        [COVER ( <cover_columns> )]
        [WITH ( <parameter_name> = <parameter_value>[, ...])]
    [,   ...]
)

Where:

  • <index_name> - unique index name for data access
  • SYNC - indicates synchronous data writing to the index. This is the only currently available option, and it is used by default.
  • <index_columns> - comma-separated list of table columns used for index searches (the last column is used as embedding, others as filtering columns)
  • <cover_columns> - list of additional table columns stored in the index to enable retrieval without accessing the main table
  • <parameter_name> and <parameter_value> - list of key-value parameters:
  • common parameters for all vector indexes:
    • vector_dimension - embedding vector dimensionality (should be between 1 and 16384)
    • vector_type - vector value type (float, uint8, or int8)
    • distance - distance function (cosine, manhattan, or euclidean), mutually exclusive with similarity
    • similarity - similarity function (inner_product or cosine), mutually exclusive with distance
  • specific parameters for vector_kmeans_tree (read more about the index type):
    • clusters - number of centroids for k-means algorithm (should be between 2 and 2048)
    • levels - number of levels in the tree (should be between 1 and 16)
    • overlap_clusters - the number of nearest clusters to add each vector to (default 1)
    • the total number of nodes in the tree, calculated as clusters raised to the power of levels, should be no more than 1073741824
    • the product of vector_dimension and clusters should be no more than 4194304

Warning

It is recommended to create a vector index after loading data into the table, as an index created on an empty table will have only one cluster and will not speed up the search at all. For more details, see Updating Vector Indexes.

Warning

Supported only for row-oriented tables. Support for column-oriented tables is currently under development.

Example

CREATE TABLE user_articles (
    article_id Uint64,
    user String,
    title String,
    text String,
    embedding String,
    INDEX emb_cosine_idx GLOBAL SYNC USING vector_kmeans_tree
    ON (user, embedding) COVER (title, text)
    WITH (
        distance="cosine",
        vector_type="float",
        vector_dimension=512,
        clusters=128,
        levels=2,
        overlap_clusters=3
    ),
    PRIMARY KEY (article_id)
)