Vector indexes

YDB supports vector indexes to efficiently find the top k rows with vector values closest to a query vector. Unlike secondary indexes that optimize equality or range queries, vector indexes enable similarity search based on distance or similarity functions.

Vector indexes are particularly useful for:

  • recommendation systems (finding similar items/users)
  • semantic search (matching text embeddings)
  • image similarity search
  • anomaly detection (finding outliers)
  • classification systems (finding nearest labeled examples)

Vector index characteristics

Vector indexes in YDB:

  • Solve nearest neighbor search problems using similarity or distance functions
  • Support multiple distance/similarity functions: "inner_product", "cosine" similarity and "cosine", "euclidean", "manhattan" distance
  • Currently implement a single index type: vector_kmeans_tree

Vector index vector_kmeans_treetype

The vector_kmeans_tree index implements a hierarchical clustering structure. Its organization includes:

  1. Hierarchical clustering:

    • The index builds multiple levels of k-means clusters
    • At each level, vectors are partitioned into specified number of clusters in power of level
    • First level clusters the entire dataset
    • Subsequent levels recursively cluster each parent cluster's contents
  2. Search process:

    • During queries, the index examines only the most promising clusters
    • This search space pruning avoids exhaustive search through all vectors
  3. Parameters:

    • levels: The number of tree levels (typically 1-3). Controls search depth
    • clusters: The number of clusters on each level (typically 64-512). Determines search breadth at each level

Vector index types

Basic vector index

The simplest form that indexes vectors without additional filtering capabilities. For example:

ALTER TABLE my_table
  ADD INDEX my_index
  GLOBAL USING vector_kmeans_tree
  ON (embedding)
  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);

Vector index with covered columns

Includes additional columns to avoid reading from the main table during queries:

ALTER TABLE my_table
  ADD INDEX my_index
  GLOBAL USING vector_kmeans_tree
  ON (embedding) COVER (data)
  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);

Prefixed vector index

Allows filtering by prefix columns before performing vector search:

ALTER TABLE my_table
  ADD INDEX my_index
  GLOBAL USING vector_kmeans_tree
  ON (user, embedding)
  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);

Prefixed vector index with covered columns

Combines prefix filtering with covered columns for optimal performance:

ALTER TABLE my_table
  ADD INDEX my_index
  GLOBAL USING vector_kmeans_tree
  ON (user, embedding) COVER (data)
  WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);

Creating vector indexes

Vector indexes can be created:

For more information about vector index parameters, see CREATE TABLE statement.

Using vector indexes

Query vector indexes using the VIEW syntax in YQL. For prefixed indexes, include the prefix columns in the WHERE clause:

SELECT user, data
FROM my_table VIEW my_index
WHERE user = "..."
ORDER BY Knn::CosineSimilarity(embedding, ...) DESC
LIMIT 10;

Limitations

Currently not supported:

  • modifying rows in indexed tables
  • bit vector type