Bloom skip index quickstart

Creating a table with a bloom_filter index

Below is a minimal example: a column-oriented table with a primary key and a local bloom_filter index on a column that is frequently used in filters.

CREATE TABLE events (
    id Uint64 NOT NULL,
    resource_id Utf8 NOT NULL,
    payload String,
    message Utf8,
    PRIMARY KEY (id),
    INDEX idx_res LOCAL USING bloom_filter
        ON (resource_id)
        WITH (false_positive_probability = 0.01)
)
WITH (
    STORE = COLUMN
);

Extending the example: n-gram index

You can add bloom_ngram_filter on a string column to the same table from the example above (column-oriented tables):

ALTER TABLE events
  ADD INDEX idx_msg LOCAL USING bloom_ngram_filter
  ON (message)
  WITH (
    ngram_size = 3,
    false_positive_probability = 0.01,
    case_sensitive = true
  );

Queries and the effect

After you load data, selective queries that filter on indexed columns may read less data: while scanning storage, the Bloom skip index skips fragments that cannot contain the requested value (compared to reading the full column without this filter).

Sample data and queries for the table above:

INSERT INTO events (id, resource_id, payload, message) VALUES
    (1, "res-1", "{}", "started"),
    (2, "res-42", "{}", "error: timeout"),
    (3, "res-2", "{}", "done");

Equality on the column protected by bloom_filter — the engine can prune irrelevant fragments when reading resource_id and related columns:

SELECT id, message
FROM events
WHERE resource_id = "res-42";

Substring search on the column with bloom_ngram_filter — the n-gram index helps drop fragments that cannot contain matching n-grams in message:

SELECT id, message
FROM events
WHERE message LIKE '%timeout%';

How to verify index effectiveness

To check that the Bloom skip index actually helps, run the same selective query on a table with enough data before and after creating the index and compare execution time.

Was the article helpful?

Local indexes

Transfer