Vector indexes

YDB supports vector indexes to efficiently find the top k rows with vector values closest to a query vector. Unlike secondary indexes that optimize equality or range queries, vector indexes enable similarity search based on distance or similarity functions.

Vector indexes are particularly useful for:

recommendation systems (finding similar items/users)
semantic search (matching text embeddings)
image similarity search
anomaly detection (finding outliers)
classification systems (finding nearest labeled examples)

Vector index characteristics

Vector indexes in YDB:

Solve nearest neighbor search problems using similarity or distance functions
Support multiple distance/similarity functions: "inner_product", "cosine" similarity and "cosine", "euclidean", "manhattan" distance
Currently implement a single index type: vector_kmeans_tree

Vector index `vector_kmeans_tree`type

The vector_kmeans_tree index implements a hierarchical clustering structure. Its organization includes:

Hierarchical clustering:
- The index builds multiple levels of k-means clusters
- At each level, vectors are partitioned into specified number of clusters in power of level
- First level clusters the entire dataset
- Subsequent levels recursively cluster each parent cluster's contents
Search process:
- During queries, the index examines only the most promising clusters
- This search space pruning avoids exhaustive search through all vectors
Parameters:
- levels: The number of tree levels (typically 1-3). Controls search depth
- clusters: The number of clusters on each level (typically 64-512). Determines search breadth at each level

Vector index types

Basic vector index

The simplest form that indexes vectors without additional filtering capabilities. For example:

ALTER TABLE my_table
  ADD INDEX my_index
  GLOBAL USING vector_kmeans_tree
  ON (embedding)
  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);

Vector index with covered columns

Includes additional columns to avoid reading from the main table during queries:

ALTER TABLE my_table
  ADD INDEX my_index
  GLOBAL USING vector_kmeans_tree
  ON (embedding) COVER (data)
  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);

Prefixed vector index

Allows filtering by prefix columns before performing vector search:

ALTER TABLE my_table
  ADD INDEX my_index
  GLOBAL USING vector_kmeans_tree
  ON (user, embedding)
  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);

Prefixed vector index with covered columns

Combines prefix filtering with covered columns for optimal performance:

ALTER TABLE my_table
  ADD INDEX my_index
  GLOBAL USING vector_kmeans_tree
  ON (user, embedding) COVER (data)
  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);

Creating vector indexes

Vector indexes can be created:

When creating a table with the YQL CREATE TABLE statement
Added to an existing table with the YQL ALTER TABLE statement

For more information about vector index parameters, see CREATE TABLE statement.

Using vector indexes

Query vector indexes using the VIEW syntax in YQL. For prefixed indexes, include the prefix columns in the WHERE clause:

SELECT user, data
FROM my_table VIEW my_index
WHERE user = "..."
ORDER BY Knn::CosineSimilarity(embedding, ...) DESC
LIMIT 10;

Limitations

Currently not supported:

modifying rows in indexed tables
bit vector type