Vector indexes
YDB supports vector indexes to efficiently find the top k rows with vector values closest to a query vector. Unlike secondary indexes that optimize equality or range queries, vector indexes enable similarity search based on distance or similarity functions.
Vector indexes are particularly useful for:
- recommendation systems (finding similar items/users)
- semantic search (matching text embeddings)
- image similarity search
- anomaly detection (finding outliers)
- classification systems (finding nearest labeled examples)
Vector index characteristics
Vector indexes in YDB:
- Solve nearest neighbor search problems using similarity or distance functions
- Support multiple distance/similarity functions: "inner_product", "cosine" similarity and "cosine", "euclidean", "manhattan" distance
- Currently implement a single index type:
vector_kmeans_tree
vector_kmeans_tree
type
Vector index The vector_kmeans_tree
index implements a hierarchical clustering structure. Its organization includes:
-
Hierarchical clustering:
- The index builds multiple levels of k-means clusters
- At each level, vectors are partitioned into specified number of clusters in power of level
- First level clusters the entire dataset
- Subsequent levels recursively cluster each parent cluster's contents
-
Search process:
- During queries, the index examines only the most promising clusters
- This search space pruning avoids exhaustive search through all vectors
-
Parameters:
levels
: The number of tree levels (typically 1-3). Controls search depthclusters
: The number of clusters on each level (typically 64-512). Determines search breadth at each level
Vector index types
Basic vector index
The simplest form that indexes vectors without additional filtering capabilities. For example:
ALTER TABLE my_table
ADD INDEX my_index
GLOBAL USING vector_kmeans_tree
ON (embedding)
WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);
Vector index with covered columns
Includes additional columns to avoid reading from the main table during queries:
ALTER TABLE my_table
ADD INDEX my_index
GLOBAL USING vector_kmeans_tree
ON (embedding) COVER (data)
WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);
Prefixed vector index
Allows filtering by prefix columns before performing vector search:
ALTER TABLE my_table
ADD INDEX my_index
GLOBAL USING vector_kmeans_tree
ON (user, embedding)
WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);
Prefixed vector index with covered columns
Combines prefix filtering with covered columns for optimal performance:
ALTER TABLE my_table
ADD INDEX my_index
GLOBAL USING vector_kmeans_tree
ON (user, embedding) COVER (data)
WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);
Creating vector indexes
Vector indexes can be created:
- When creating a table with the YQL
CREATE TABLE
statement - Added to an existing table with the YQL
ALTER TABLE
statement
For more information about vector index parameters, see CREATE TABLE
statement.
Using vector indexes
Query vector indexes using the VIEW syntax in YQL. For prefixed indexes, include the prefix columns in the WHERE clause:
SELECT user, data
FROM my_table VIEW my_index
WHERE user = "..."
ORDER BY Knn::CosineSimilarity(embedding, ...) DESC
LIMIT 10;
Limitations
Currently not supported:
- modifying rows in indexed tables
- bit vector type