Fulltext index

Fulltext indexes in row-oriented tables are created using the same syntax as secondary indexes, by specifying fulltext_plain or fulltext_relevance as the index type. Subset of syntax available for fulltext indexes:

CREATE TABLE `<table_name>` (
    ...
    INDEX `<index_name>`
        GLOBAL
        [SYNC]
        USING fulltext_plain | fulltext_relevance
        ON ( <text_column> )
        [COVER ( <cover_columns> )]
        [WITH ( <parameter_name> = <parameter_value>[, ...])]
    [,   ...]
)

Where:

  • <index_name> - unique index name for data access
  • SYNC - indicates synchronous data writing to the index. This is the only currently available option, and it is used by default.
  • <text_column> - a single table column with text content (currently only one indexed column is supported)
  • <cover_columns> - list of additional table columns stored in the index to enable retrieval without accessing the main table
  • <parameter_name> and <parameter_value> - list of key-value parameters:
  • common parameters for all fulltext indexes:
    • tokenizer - tokenizer type (standard, whitespace, or keyword)
    • use_filter_lowercase - lowercase filter (true or false)
    • use_filter_length - token length filter (true or false); when true, tokens shorter than filter_length_min or longer than filter_length_max are not indexed and are ignored during search
    • filter_length_min - minimum token length (positive integer); only applied when use_filter_length=true
    • filter_length_max - maximum token length (positive integer); only applied when use_filter_length=true
    • use_filter_snowball - Snowball stemmer filter (true or false)
    • language - language for the Snowball stemmer (for example, english, russian)
    • use_filter_ngram - n-gram filter (true or false)
    • use_filter_edge_ngram - edge n-gram filter (true or false)
    • filter_ngram_min_length - minimum n-gram length (positive integer)
    • filter_ngram_max_length - maximum n-gram length (positive integer, (\ge) filter_ngram_min_length)

Warning

Supported only for row-oriented tables. Support for column-oriented tables is currently under development.

Examples

Basic example

CREATE TABLE articles (
    id Uint64,
    title String,
    body String,
    INDEX ft_idx GLOBAL USING fulltext_relevance
    ON (body) COVER (title)
    WITH (
        tokenizer=standard,
        use_filter_lowercase=true
    ),
    PRIMARY KEY (id)
)

Example with n-grams

N-grams allow finding substrings within words. When using the n-gram filter, each word is split into all possible substrings of specified length. For example, the word "search" with parameters filter_ngram_min_length=3 and filter_ngram_max_length=5 will be split into: "sea", "ear", "arc", "rch" (3-grams), "sear", "earc", "arch" (4-grams), "searc", "earch" (5-grams). This enables finding documents by partial matches anywhere within a word.

CREATE TABLE products (
    id Uint64,
    name String,
    description String,
    INDEX ft_ngram_idx GLOBAL USING fulltext_plain
    ON (name)
    WITH (
        tokenizer=standard,
        use_filter_lowercase=true,
        use_filter_ngram=true,
        filter_ngram_min_length=3,
        filter_ngram_max_length=5
    ),
    PRIMARY KEY (id)
)

Example with edge n-grams

Edge n-grams create substrings only from the beginning of a word. For example, the word "search" with parameters filter_ngram_min_length=2 and filter_ngram_max_length=10 will be split into: "se", "sea", "sear", "searc", "search". This is particularly useful for implementing autocomplete functionality, when you need to find words that start with a user-entered prefix.

CREATE TABLE users (
    id Uint64,
    username String,
    email String,
    INDEX ft_edge_ngram_idx GLOBAL USING fulltext_plain
    ON (username)
    WITH (
        tokenizer=standard,
        use_filter_lowercase=true,
        use_filter_edge_ngram=true,
        filter_ngram_min_length=2,
        filter_ngram_max_length=10
    ),
    PRIMARY KEY (id)
)
Previous
Next