Vector index
Vector index in row-oriented tables is created using the same syntax as secondary indexes, by specifying vector_kmeans_tree as the index type. Subset of syntax available for vector indexes:
CREATE TABLE `<table_name>` (
...
INDEX `<index_name>`
GLOBAL
[SYNC]
USING vector_kmeans_tree
ON ( <index_columns> )
[COVER ( <cover_columns> )]
[WITH ( <parameter_name> = <parameter_value>[, ...])]
[, ...]
)
Where:
<index_name>- unique index name for data accessSYNC- indicates synchronous data writing to the index. This is the only currently available option, and it is used by default.<index_columns>- comma-separated list of table columns used for index searches (the last column is used as embedding, others as filtering columns)<cover_columns>- list of additional table columns stored in the index to enable retrieval without accessing the main table<parameter_name>and<parameter_value>- list of key-value parameters:
- common parameters for all vector indexes:
vector_dimension- embedding vector dimensionality (should be between 1 and 16384)vector_type- vector value type (float,uint8, orint8)distance- distance function (cosine,manhattan, oreuclidean), mutually exclusive withsimilaritysimilarity- similarity function (inner_productorcosine), mutually exclusive withdistance
- specific parameters for
vector_kmeans_tree(read more about the index type):clusters- number of centroids for k-means algorithm (should be between 2 and 2048)levels- number of levels in the tree (should be between 1 and 16)overlap_clusters- the number of nearest clusters to add each vector to (default 1)- the total number of nodes in the tree, calculated as
clustersraised to the power oflevels, should be no more than 1073741824 - the product of
vector_dimensionandclustersshould be no more than 4194304
Warning
It is recommended to create a vector index after loading data into the table, as an index created on an empty table will have only one cluster and will not speed up the search at all. For more details, see Updating Vector Indexes.
Warning
Supported only for row-oriented tables. Support for column-oriented tables is currently under development.
Example
CREATE TABLE user_articles (
article_id Uint64,
user String,
title String,
text String,
embedding String,
INDEX emb_cosine_idx GLOBAL SYNC USING vector_kmeans_tree
ON (user, embedding) COVER (title, text)
WITH (
distance="cosine",
vector_type="float",
vector_dimension=512,
clusters=128,
levels=2,
overlap_clusters=3
),
PRIMARY KEY (article_id)
)
Was the article helpful?
Previous
Next