VIEW (Vector index)

Warning

Supported only for row-oriented tables. Support for column-oriented tables is currently under development.

Alert

The functionality of vector indexes is available in the test mode in main. This functionality will be fully available in version 25.1.

The following features are not supported:

  • Index update: the main table can be modified, but the existing index will not be updated. A new index is to be built to reflect the changes. If necessary, the existing index can be atomically replaced with the newly built one.
  • Building an index for vectors with bit quantization.

These limitations may be removed in future versions.

To select data from a row-oriented table using a vector index, use the following statements:

SELECT ...
    FROM TableName VIEW IndexName
    WHERE ...
    ORDER BY Knn::SomeDistance(...)
    LIMIT ...
SELECT ...
    FROM TableName VIEW IndexName
    WHERE ...
    ORDER BY Knn::SomeSimilarity(...) DESC
    LIMIT ...

Note

A vector index supports a distance or similarity function from the Knn extension specified during its construction.

A vector index isn't automatically selected by the optimizer and must be specified explicitly using the VIEW IndexName expression.

KMeansTreeSearchTopSize

Indexed vector search is based on an approximate algorithm (ANN, Approximate Nearest Neighbors). That means that indexed search may produce a result that differs from a similar full-scan nearest neighbor search.

Completeness of the indexed vector search is controlled by the following parameter: PRAGMA ydb.KMeansTreeSearchTopSize.

This parameter controls the maximum number of scanned clusters nearest to the requested search vector at every level of the search tree.
The parameter should be set explicitly for every search query.

The default value is 1. This means that only one nearest cluster is scanned at every level of the search tree by default. This parameter value maximizes search performance and results in good search quality for vectors near to the center of a cluster. But this value may be insufficient for vectors that are about equally close to multiple clusters. So, to increase the search quality for such vectors (at the expense of slightly reduced search performance), you should increase the PRAGMA value, for example:

PRAGMA ydb.KMeansTreeSearchTopSize="10";
SELECT *
    FROM TableName VIEW IndexName
    ORDER BY Knn::CosineDistance(embedding, $target)
    LIMIT 10

Examples

  • Select all the fields from the series row-oriented table using the views_index vector index created for embedding and cosine similarity:

    SELECT series_id, title, info, release_date, views, uploaded_user_id, Knn::CosineSimilarity(embedding, $target) as similarity
        FROM series VIEW views_index
        ORDER BY similarity DESC
        LIMIT 10
    
  • Select all the fields from the series row-oriented table using the views_filtered_index filtered vector index created for embedding and optimized for efficient filtering by release_date:

    SELECT series_id, title, info, release_date, views, uploaded_user_id, Knn::CosineSimilarity(embedding, $target) as similarity
        FROM series VIEW views_filtered_index
        WHERE release_date = "2025-03-31"
        ORDER BY similarity DESC
        LIMIT 10