Vector indexes
Vector indexes are specialized data structures that enable efficient similarity search in high-dimensional spaces. Unlike traditional indexes that optimize exact lookups, vector indexes allow finding the most similar items to a query vector based on mathematical distance or similarity measures.
Data in a YDB table is stored and sorted by a primary key, enabling efficient point lookups and range scans. Vector indexes provide similar efficiency for nearest neighbor searches in vector spaces, which is particularly valuable for working with embeddings and other high-dimensional data representations.
This article describes practical operations with vector indexes. For conceptual information about vector index types and their characteristics, see Vector indexes in the Concepts section.
Creating vector indexes
A vector index can be created with the following YQL commands:
Example of creating a prefixed vector index with covered columns:
ALTER TABLE my_table
ADD INDEX my_index
GLOBAL USING vector_kmeans_tree
ON (user, embedding) COVER (data)
WITH (distance=cosine, type="uint8", dimension=512, levels=2, clusters=128);
Key parameters for vector_kmeans_tree
:
distance
/similarity
: Metric function ("cosine", "euclidean", etc.)type
: Data type ("float", "int8", "uint8")dimension
: Number of dimensions (<= 16384)levels
: Tree depthclusters
: Number of clusters per level (values > 1000 may impact performance)
Since building a vector index requires processing existing data, index creation on populated tables may take significant time. This operation runs in the background, allowing continued table access during construction. The index becomes available automatically when ready.
Using vector indexes for similarity search
To perform similarity searches, explicitly specify the index name in the VIEW clause. For prefixed indexes, include prefix column conditions in the WHERE clause:
DECLARE $query_vector AS List<Uint8>;
SELECT user, data
FROM my_table VIEW my_index
WHERE user = "john_doe"
ORDER BY Knn::CosineSimilarity(embedding, $query_vector) DESC
LIMIT 10;
Without the VIEW clause, the query would perform a full table scan with brute-force vector comparison.
Checking the cost of queries
Any query made in a transactional application should be checked in terms of the number of I/O operations it performed in the database and how much CPU was used to run it. You should also make sure these indicators don't continuously grow as the database volume grows. YDB returns statistics required for the analysis after running each query.
If you use the YDB CLI, select the --stats
option to enable printing statistics after executing the yql
command. All YDB SDKs also contain structures with statistics returned after running a query. If you make a query in the UI, you'll see a tab with statistics next to the results tab.
Warning
Vector indexes currently don't support data modification operations.
Any attempt to modify rows in indexed tables will fail.
This limitation will be removed in future releases.