Fulltext index
Fulltext indexes in row-oriented tables are created using the same syntax as secondary indexes, by specifying fulltext_plain or fulltext_relevance as the index type. Subset of syntax available for fulltext indexes:
CREATE TABLE `<table_name>` (
...
INDEX `<index_name>`
GLOBAL
[SYNC]
USING fulltext_plain | fulltext_relevance
ON ( <text_column> )
[COVER ( <cover_columns> )]
[WITH ( <parameter_name> = <parameter_value>[, ...])]
[, ...]
)
Where:
<index_name>- unique index name for data accessSYNC- indicates synchronous data writing to the index. This is the only currently available option, and it is used by default.<text_column>- a single table column with text content (currently only one indexed column is supported)<cover_columns>- list of additional table columns stored in the index to enable retrieval without accessing the main table<parameter_name>and<parameter_value>- list of key-value parameters:
- common parameters for all fulltext indexes:
tokenizer- tokenizer type (standard,whitespace, orkeyword)use_filter_lowercase- lowercase filter (trueorfalse)use_filter_length- token length filter (trueorfalse); whentrue, tokens shorter thanfilter_length_minor longer thanfilter_length_maxare not indexed and are ignored during searchfilter_length_min- minimum token length (positive integer); only applied whenuse_filter_length=truefilter_length_max- maximum token length (positive integer); only applied whenuse_filter_length=trueuse_filter_snowball- Snowball stemmer filter (trueorfalse)language- language for the Snowball stemmer (for example,english,russian)use_filter_ngram- n-gram filter (trueorfalse)use_filter_edge_ngram- edge n-gram filter (trueorfalse)filter_ngram_min_length- minimum n-gram length (positive integer)filter_ngram_max_length- maximum n-gram length (positive integer, (\ge)filter_ngram_min_length)
Warning
Supported only for row-oriented tables. Support for column-oriented tables is currently under development.
Examples
Basic example
CREATE TABLE articles (
id Uint64,
title String,
body String,
INDEX ft_idx GLOBAL USING fulltext_relevance
ON (body) COVER (title)
WITH (
tokenizer=standard,
use_filter_lowercase=true
),
PRIMARY KEY (id)
)
Example with n-grams
N-grams allow finding substrings within words. When using the n-gram filter, each word is split into all possible substrings of specified length. For example, the word "search" with parameters filter_ngram_min_length=3 and filter_ngram_max_length=5 will be split into: "sea", "ear", "arc", "rch" (3-grams), "sear", "earc", "arch" (4-grams), "searc", "earch" (5-grams). This enables finding documents by partial matches anywhere within a word.
CREATE TABLE products (
id Uint64,
name String,
description String,
INDEX ft_ngram_idx GLOBAL USING fulltext_plain
ON (name)
WITH (
tokenizer=standard,
use_filter_lowercase=true,
use_filter_ngram=true,
filter_ngram_min_length=3,
filter_ngram_max_length=5
),
PRIMARY KEY (id)
)
Example with edge n-grams
Edge n-grams create substrings only from the beginning of a word. For example, the word "search" with parameters filter_ngram_min_length=2 and filter_ngram_max_length=10 will be split into: "se", "sea", "sear", "searc", "search". This is particularly useful for implementing autocomplete functionality, when you need to find words that start with a user-entered prefix.
CREATE TABLE users (
id Uint64,
username String,
email String,
INDEX ft_edge_ngram_idx GLOBAL USING fulltext_plain
ON (username)
WITH (
tokenizer=standard,
use_filter_lowercase=true,
use_filter_edge_ngram=true,
filter_ngram_min_length=2,
filter_ngram_max_length=10
),
PRIMARY KEY (id)
)