Bloom skip indexes
Bloom skip indexes are local auxiliary structures based on a Bloom filter that speed up selective queries by skipping data fragments that cannot contain the requested value. Unlike global secondary indexes, they act as read-time filters on the main table and reduce the amount of data that must be actually read.
Types
YDB supports two types:
bloom_filter: a filter over exact values of the indexed column; use for equality andIN(see when to use which).bloom_ngram_filter: a filter over n-grams of a string column (String,Utf8); use for substring andLIKEpattern search on column-oriented tables.
Parameters and defaults
Full list of WITH (...) parameters and defaults:
bloom_filterfalse_positive_probability: Target false-positive rate of the filter: the fraction of fragments that are not skipped even though the requested value is not there (range(0, 1)). A lower value reduces extra reads but increases index size.- If omitted, the default is
0.0001for row-oriented tables and0.1for column-oriented tables. The stricter default on row-oriented tables matches OLTP point lookups; on column-oriented tables it reflects analytical scans, where a larger trade-off between index size and fragment skipping is acceptable.
- If omitted, the default is
- Indexed columns: YQL types that support equality comparison, except
Yson,Json, andJsonDocument(see comparison operators). One column in a column-oriented table; multiple columns in a row-oriented table.
bloom_ngram_filter(StringandUtf8columns; column-oriented tables only)ngram_size: N-gram length, an integer from3to8(default3).false_positive_probability: Target false-positive rate (range(0, 1); default0.1).case_sensitive: Whether n-grams respect character case:trueorfalse(defaulttrue).
Creation syntax: CREATE TABLE, ALTER TABLE ADD INDEX.
To change parameters after creation, use ALTER INDEX.
Examples
Create a table with a bloom_filter index:
CREATE TABLE events (
id Uint64,
resource_id Utf8,
message Utf8,
PRIMARY KEY (id),
INDEX idx_bloom LOCAL USING bloom_filter
ON (resource_id)
WITH (false_positive_probability = 0.01)
);
Add a bloom_ngram_filter index to an existing table:
ALTER TABLE events
ADD INDEX idx_ngram LOCAL USING bloom_ngram_filter
ON (message)
WITH (
ngram_size = 3,
false_positive_probability = 0.01,
case_sensitive = true
);
Alter index parameters:
ALTER TABLE events ALTER INDEX idx_ngram SET (
ngram_size = 4,
false_positive_probability = 0.005,
case_sensitive = false
);
Parameter tuning
If queries still read too much data after deploying the index, try lowering false_positive_probability. If the index uses too much space, try raising it.
Use ALTER INDEX to change values, as in the example above.
Tips:
- Start with
false_positive_probability = 0.01, then adjust based on real read metrics and index size. - For
bloom_ngram_filter,ngram_sizetypically starts at3; increasing it can make filtering stricter for longer substrings. - Change one parameter at a time and compare results under the same workload.
Behavior and limitations
- The index is always local (
LOCAL); there is no global variant. - Queries do not use the
VIEW <index>syntax (unlike, for example, fulltext indexes). - The filter is applied on read only to data fragments where an index block was already stored during a write or portion merge, other fragments are not skipped by this index until merge.
Limitations
COVER (...)and extra index columns are not supported.- Column-oriented tables allow only one indexed column. Row-oriented tables allow multiple indexed columns.
bloom_ngram_filteris not supported on row-oriented tables.