VIEW (Vector index)
To select data from a row-oriented table using a vector index, use the following statements:
PRAGMA ydb.KMeansTreeSearchTopSize = "10";
SELECT ...
FROM TableName VIEW IndexName
WHERE ...
ORDER BY Knn::SomeDistance(...)
LIMIT ...
PRAGMA ydb.KMeansTreeSearchTopSize = "10";
SELECT ...
FROM TableName VIEW IndexName
WHERE ...
ORDER BY Knn::SomeSimilarity(...) DESC
LIMIT ...
Principles of operation and settings of the vector index are described in detail in Vector Indexes.
Note
A vector index supports a distance or similarity function from the Knn extension specified during its construction.
A vector index isn't automatically selected by the optimizer and must be specified explicitly using the VIEW IndexName expression.
If the VIEW expression is not used, the query will perform a full table scan with pairwise comparison of vectors. It is recommended to check the optimality of the written query using query plan analysis. In particular, ensure there is no full scan of the main table.
Warning
Indexed vector search completeness or performance may decrease after updating a large amount of data in a table with a vector index. For more details, see Updating Vector Indexes.
KMeansTreeSearchTopSize
Indexed vector search is based on an approximate algorithm (ANN, Approximate Nearest Neighbors). That means that indexed search may produce a result that differs from a similar full-scan nearest neighbor search.
Completeness of the indexed vector search is controlled by the following parameter: PRAGMA ydb.KMeansTreeSearchTopSize.
This parameter controls the maximum number of scanned clusters nearest to the requested search vector at every level of the search tree.
The parameter should be set explicitly for every search query.
The default value is 1. This means that only one nearest cluster is scanned at every level of the search tree by default. This parameter value maximizes search performance but results in minimum possible recall. To increase search recall (at the expense of slightly reduced search performance), you should increase the PRAGMA value, for example:
PRAGMA ydb.KMeansTreeSearchTopSize="10";
SELECT *
FROM TableName VIEW IndexName
ORDER BY Knn::CosineDistance(embedding, $target)
LIMIT 10
Examples
-
Select all the fields from the
seriesrow-oriented table using theviews_indexvector index created forembeddingand cosine similarity:PRAGMA ydb.KMeansTreeSearchTopSize="10"; SELECT series_id, title, info, release_date, views, uploaded_user_id, Knn::CosineSimilarity(embedding, $target) as similarity FROM series VIEW views_index ORDER BY similarity DESC LIMIT 10 -
Select all the fields from the
seriesrow-oriented table using theviews_filtered_indexfiltered vector index created forembeddingand optimized for efficient filtering byrelease_date:PRAGMA ydb.KMeansTreeSearchTopSize="10"; SELECT series_id, title, info, release_date, views, uploaded_user_id, Knn::CosineSimilarity(embedding, $target) as similarity FROM series VIEW views_filtered_index WHERE release_date = "2025-03-31" ORDER BY similarity DESC LIMIT 10