Working with Full-Text Search

Full-text search allows searching for words or phrases in a large body of text through an inverted index. The search can be exact or fuzzy and performed over the text types JSON, CHAR, VARCHAR, TEXT, and LONGTEXT.

SingleStore supports an updated full-text search process referred to as VERSION 2. For information on Version 1, refer to Legacy (Version 1) SingleStore Process.

The following are supported in VERSION 2:

  • Full-text indexes are only supported on columnstore tables.

  • Full-text indexes can be created with a CREATE TABLE statement using the FULLTEXT index type, or can be added to existing tables via an ALTER TABLE ADD FULLTEXT statement.

  • Only one full-text index per table is supported.

  • Full-text indexes can be dropped via DROP INDEX or ALTER TABLE DROP INDEX statements.

  • VERSION 2 uses BM25 scoring.

  • To add a column to a full-text index, drop and recreate the index.

The USING VERSION 2 syntax must be used in the CREATE TABLE command to utilize the VERSION 2 process.

CREATE TABLE <table_name> FULLTEXT USING VERSION 2 <fts_index_name> (<fts_col>) | (<fts_col1>,..., <fts_coln>))

Note

The legacy full-text search version is deprecated. SingleStore recommends using VERSION 2 for new development of applications that use full-text search.

VERSION 2 does not allow the column names of full-text columns to contain a $ character. An error will be generated if a full-text column name contains a $ character.

The full-text search version (from legacy to version 2) can be changed on an existing table. First, the existing full-text index must be dropped using the DROP INDEX command:

Note

If an index name was not designated when creating the table, the full-text index key_name must be used when dropping the index. The full-text index key_name is displayed when the SHOW INDEXES FROM <table_name> command is executed.

DROP INDEX <fts_index_name> | <index_key_name> ON <table_name>;

Then, ALTER TABLE command is used with the FULLTEXT USING VERSION 2 argument :

ALTER TABLE <table_name> ADD FULLTEXT USING VERSION 2 fts_index_name (<fts_col>) | (<fts_col1>,..., <fts_coln>);

Content in columns that are full-text indexed can be searched using MATCH , BM25, or BM25_GLOBAL functions. Each MATCH , BM25, or BM25_GLOBAL clause applies to only one table. To search against multiple tables, specify multiple MATCH, BM25, or BM25_GLOBAL clauses.

Note

New inserts and updates into columnstore tables may initially be stored in a hidden rowstore table before being flushed to a segment file. The affected segment is re-indexed when the background flusher runs.

In that case, the full-text index in the columnstore will be updated asynchronously for new inserts and updates. Inserts and updates from this rowstore table can be force-pushed  to the columnstore table by using the OPTIMIZE TABLE <table_name> FULL command.

Since an index is created for each segment file, the distribution of words within the segment may affect the score of full-text queries, especially when the segments have very few rows and the columns have very few words.

Note

Full-text search may not be used inside a CTE (WITH (Common Table Expressions)) because the CTE produces a dynamic table which does not have a full-text index. Similarly, full-text search cannot be used on derived tables. These restrictions apply to MATCH, BM25, and BM25_GLOBAL.

For more information on columnstore tables, see How the Columnstore Works.

Version 2 SingleStore Process

SingleStore's VERSION 2 full-text search system uses a JLucene service. The JLucene service is a Java process that provides an interface for the SingleStore engine. This interface allows the engine to perform full-text searches and create full-text search indexes for future use. The JLucene service uses software from the Apache Lucene project. The SingleStore engine and the JLucene service run on the same machine. Communication between the two processes occurs through domain sockets and shared memory. Typically, communication occurs on a per-segment basis.

The default analyzer for VERSION 2 is the StandardAnalyzer from Apache Lucene configured to use the StandardTokenizer from Apache Lucene, a lower-case filter, and a set of stopwords. The StandardTokenizer is a grammar-based tokenizer which uses the word break rules from the Unicode Text Segmentation, as specified in Unicode Standard Annex #29.

Version 2 Installation for Self-Managed Users

Full-text version 2 requires Java 21 or later to be installed on all nodes. Refer to https://www.oracle.com/java/technologies/downloads/ for more information on how to install Java.

On each node, the following engine variables must be set. These are not sync variables and must be set individually on each node.

  • fts2_java_home: This is the path to the top-level directory of the Java installation which is stored in the $JAVA_HOME environment variable.

  • fts2_java_path: This is the path to the Java executable which is typically located in $JAVA_HOME/bin/java.

For example, assuming all of the hosts have the same Java paths as reflected below, run the following commands to set the fts2_java_home and fts2_java_path engine variables to the same Java path values on all nodes in the cluster:

sdb-admin update-config --all --set-global --key "fts2_java_home" --value "/usr/lib/jvm/jre-21-openjdk"
sdb-admin update-config --all --set-global --key "fts2_java_path" --value "/usr/lib/jvm/jre-21-openjdk/bin/java"

If the locations for the Java home directory and Java executable differ from those in the preceding commands, change the path names in the --value arguments to match your environment

To restart your cluster with the updated configurations set, run the following command:

sdb-admin restart-node --all --yes

Refer to sdb-admin update-config for more information about updating nodes in your cluster.

Terms in Full-Text Searches

There are two types of search terms: single terms and phrases. A single term is a single word such as test or hello which does not require quotes. A phrase is a group of words surrounded by double quotes such as "hello SingleStore". Multiple search terms can be combined with Boolean operators to form more complex queries.

Search terms can be modified to provide a wide range of search options as described below.

Wildcard Support

Single and multiple character wildcard searches within single terms are supported, but not within search phrases.

  • Use the ? symbol to perform a single character wildcard search.

  • Use the * symbol to perform a multi character wildcard search.

Important

Neither ? or * are supported at the beginning of a term. For example, searching for ?ello or *ello will generate an error.

A single character wildcard search matches words based on a single-character. For example, to search for “text” or “test”, use the search term: te?t.

A multiple character wildcard search matches words based on zero or more characters. For example, to search for “test”, “tests”, or “tester”, use the search term test*. Wildcard searches in the middle of a search term can also be used, such as te*t.

The following examples demonstrate how these wildcard searches appear in sample queries:

SELECT * FROM wilsearch1 WHERE MATCH (TABLE wilsearch1) AGAINST ('col1:te?t');
SELECT * FROM wilsearch1 WHERE MATCH (TABLE wilsearch1) AGAINST ('col1:te*t');

Boosting a Term

Boosting a search term means increasing the relevance or importance of that search term in the search results. To boost a search term use the caret ("^") symbol with a boost factor (a number) at the end of the search term. The higher the boost factor, the more relevant the search term will be.

For example, if you are searching for Single Store and you want the term Store to be more relevant, boost it using the ^ symbol along with the boost factor next to the term, such as you could type Single Store^4. This will make rows with the term Store appear more relevant.

You can also boost phrase as in the example: "Single Store"^4 "MySQL".

Note

The boost factor is 1 by default. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2), which leads to the term or phrase have less relevance.

The following examples demonstrate how these wildcard searches appear in sample queries:

SELECT * FROM bstsearch1 WHERE MATCH (TABLE bstsearch1) AGAINST ('col1:"Single Store"^4 "MySQL"');
SELECT * FROM bssearch1 WHERE MATCH (TABLE bstsearch1) AGAINST ('col1:SingleStore MySQL^0.02');

Full-text version 2 supports operators listed on the Java Lucene full-text search string syntax page.

Grouping Terms in a Query

Grouping Single Terms

Parentheses can be used to group terms to form subqueries. This can be useful for controlling the boolean logic for a query.

For example, to search for either "Single" or "Store" and "MemSQL", use the query (Single OR Store) AND MemSQL. This ensures that "MemSQL" exists with either the"Single" or "Store" search terms.

Grouping Multiple Terms into a Single Field

Parentheses can be used to group multiple clauses into a single field. The following query can be used to search for "SingleStore" in the col1 full-text column or for "MemSQL" in the col2 full-text column.

SELECT * FROM grpsearch1 WHERE MATCH (TABLE grpsearch1) AGAINST ('col1:SingleStore OR col2:MemSQL');

Escaping Special Characters

Special characters that are part of the query syntax must be escaped to directly match them. These special characters are:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /

Use the \ character before the special character to escape it. For example, to search for (1+1):2 ,use the query \(1\+1\)\:2.

Regular Expression Searches

Regular expression searches that match a pattern between forward slashes ("/") are supported. For example, to find rows containing "moat" or "boat", use /[mb]oat/:

SELECT * FROM rexsearch1 WHERE MATCH (TABLE rexsearch1) AGAINST ('col1:/[mb]oat/');

The following example searches for dates with the format of month/day/year where the month and day can have one or two digits and year can have two or four digits.

SELECT * FROM rexsearch1 WHERE MATCH (TABLE rexsearch1) AGAINST ('col1:/\d{1,2}\/\d{1,2}\/\d{2,4}/');

Fuzzy Searches

Fuzzy searches based on the Damerau-Levenshtein Distance are supported. To perform a fuzzy search use the tilde ("~") symbol at the end of a single term. For example, to search for a term similar in spelling to "roam" use the fuzzy search: roam~, which will find terms like "foam" and "roams".

An edit is a change made to a search term to transform it into another string. An optional “edit” parameter can be used after the tilde to specify the maximum number of edits permitted. This value can be 0, 1, or 2. If not specified, the default value is 2 edits.

Note

Only integer (non-fractional) values are permitted.

The following is an example of a fuzzy search of the term "roam" without the optional parameter:

SELECT * FROM fuzsearch1 WHERE MATCH (TABLE fuzsearch1) AGAINST ('col1:roam~');

The following is an example of a fuzzy search of the term "roam" with the optional parameter:

SELECT * FROM fuzsearch1 WHERE MATCH (TABLE fuzsearch1) AGAINST ('col1:roam~1');

Fuzzy Search OPTIONS Clause

Fuzzy searches can be augmented by using an optional OPTIONS clause within the MATCH syntax. The syntax for adding options is as follows:

MATCH (TABLE <table_name>) AGAINST (<col_name$key.path:search_term> OPTIONS <json_options>);

The set of supported options are:

  • fuzzy_prefix_length: This is the number of characters at the start of a search term that must be identical (not fuzzy) to the query term if the query is to match the search term. The default value is 0.

  • fuzzy_max_expansions : This is the maximum number of terms to match against. The default value is 50.

  • fuzzy_transpositions: This allows transpositions with adjacent characters. This can capture some errors more efficiently, particularly mistyped adjacent characters. When transposition is allowed, matching against "ca~" will be returned "ac" also. The default value is TRUE.

All of the options are case-sensitive, optional, and can be combined as required. For example:

SELECT * FROM articles WHERE MATCH (TABLE articles) AGAINST ('body:roam~' OPTIONS '{"fuzzy_prefix_length": 2}');
SELECT * FROM articles
WHERE MATCH (TABLE articles) AGAINST ('body:roam~' OPTIONS '{"fuzzy_prefix_length": 1, "fuzzy_transpositions": false}');

Proximity Searches

Finding words within a specified distance away is supported. To perform a proximity search, use the tilde ("~") symbol at the end of a search phrase. For example, to search for a "Single" and "Store" within 10 words of each other in a row, use the search "Single Store"~10:

SELECT * FROM prxsearch1 WHERE MATCH (TABLE prxsearch1) AGAINST ('col1:"Single Store"~10');

Range Searches

Range searches match rows where the column(s) values are between the lower and upper bound specified by the query. Range searches can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically or by value depending on whether the search is over string or numeric values.

Square brackets ("[]") indicate an inclusive search, curly brackets ("{}") indicate an exclusive search. A range clause of [123 TO 345] implies a numeric range search, a range clause of [abc TO def] implies a lexicographic range search, and [$123$ TO $345$] forces a lexicographic range despite the numeric range values. Specific examples are provided below.

An inclusive range search matches all rows where the values are between the search terms, including values equal to the search terms. Inclusive range queries are denoted by square brackets (“[ ]”). For example, the following query searches for rows where the numeric value of col1 is 2008 through 2020:

SELECT * FROM rngsearch1
WHERE MATCH (TABLE rngsearch1) AGAINST ('col1:[2008 TO 2020]');

An exclusive range search matches all rows where the values are between the search terms, but are not equal to the search terms. Exclusive range queries are denoted by curly brackets (“{ }”). For example, the following query searches for rows where the numeric value of col1 is between 2008 and 2020, but is not equal to 2008 or 2020:

SELECT * FROM rngsearch1
WHERE MATCH (TABLE rngsearch1) AGAINST ('col1:{2008 TO 2020}');

Range searches may be lexicographic or numeric. Typically searches over strings are expected to order lexicographically while searches over numeric values are expected to order by numeric value. The type of range search to use (lexicographic or numeric) is determined by the values entered in the range clause in the query. If both the upper and lower bounds of the range are numeric, then a numeric range search will be performed. If either value is not numeric, then a lexicographic query will be used.

The following query will use a numeric search as both the values 2008 and 2020 are numeric.

SELECT * FROM rngsearch1
WHERE MATCH (TABLE rngsearch1) AGAINST ('col1:{2008 TO 2020}');

The following query will use a lexicographic search as both of the range values are strings that do not contain numeric values. Note that this query is inclusive of A, but exclusive of B.

SELECT * FROM rngsearch1
WHERE MATCH (TABLE rngsearch1) AGAINST ('col1:[A TO B}');

This query will use a numeric search as both the values "2008" and "2020" can be parsed numerically.

SELECT * FROM rngsearch1
WHERE MATCH (TABLE rngsearch1) AGAINST ('col1:{"2008" TO "2020"}');

It is possible to force a lexicographic search on a numeric-style range clause. To do so, surround the numeric upper and lower bounds with the '$' character. The following query will use a lexicographic search.

SELECT * FROM rngsearch1
WHERE MATCH (TABLE rngsearch1) AGAINST ('col1:[$2008$ TO $2020$]'

ORDER BY … LIMIT Optimization

Full-text search queries can take advantage of an optimization that pushes down the value of LIMIT in an ORDER BY … LIMIT query into the full-text index search. This optimization will reduce the number of results returned from the full-text index search which can increase query performance.

For this optimization to work, the query must follow these rules:

  • The WHERE clause must either be empty or the expression in the WHERE clause must be the same full-text expression that is in the ORDER BY clause.

  • The ORDER BY clause must use DESC sort order.

  • There must be only one full-text search function (MATCH, BM25, or BM25_GLOBAL) in the query.

Relevancy Score

The relevancy score of an expression in a MATCH statement denotes the ranking of the expression based on the following factors:

  • Number of times an expression appears in a column. More occurrences of an expression in the matched column(s) increases its relevancy score.

  • Rarity of the expression. Rare words have a higher relevancy score than commonly used words.

  • The length of the column containing the expression. A column with a short expression has a higher relevancy score than a column with a long expression.

BM25 Scoring

Full-text search version 2 supports BM25 scoring for. Refer to BM25 for more information.

Status

The Alloc_fts2_svc status variables is the number of allocated bytes (out of the total max_memory) that is attributable to the next-generation (VERSION 2) full-text subprocess. This status variable will be exposed in the SHOW STATUS EXTENDED command.

Configurations

A set of global variables is available to configure full-text search version 2. Refer to the full-text variables sections of List of Engine Variables for details. Also, refer to Configuring Full Text and Vector Indexes for information on configuring the engine for high performance for full-text indexes.

Note

If your system is experiencing high load due to full-text index builds, SingleStore recommends reducing the value of fts2_max_connections to 16 or 8. This change will reduce load, but will slow down indexing, but will reduce load.

Examples

Create a Version 2 Full-Text Search Index and Query Using MATCH

This example creates a FULLTEXT index for both the title column and the body column. Either column can be queried separately using MATCH (TABLE <table_name>) AGAINST (<expression>), and the index on the column will be applied.

CREATE TABLE articles (
id INT UNSIGNED,
year int UNSIGNED,
title VARCHAR(200),
body TEXT,
SORT KEY (id),
FULLTEXT USING VERSION 2 art_ft_index (title, body));
INSERT INTO articles (id, year, title, body) VALUES
(1, 2021, 'Introduction to SQL', 'SQL is a standard language for accessing and manipulating databases.'),
(2, 2022, 'Advanced SQL Techniques', 'Explore advanced techniques and functions in SQL for better data manipulation.'),
(3, 2020, 'Database Optimization', 'Learn about various optimization techniques to improve database performance.'),
(4, 2023, 'SQL in Web Development', 'Discover how SQL is used in web development to interact with databases.'),
(5, 2019, 'Data Security in SQL', 'An overview of best practices for securing data in SQL databases.'),
(6, 2021, 'SQL and Data Analysis', 'Using SQL for effective data analysis and reporting.'),
(7, 2022, 'Introduction to Database Design', 'Fundamentals of designing a robust and scalable database.'),
(8, 2020, 'SQL Performance Tuning', 'Tips and techniques for tuning SQL queries for better performance.'),
(9, 2023, 'Using SQL with Python', 'Integrating SQL with Python for data science and automation tasks.'),
(10, 2019, 'NoSQL vs SQL', 'A comparison of NoSQL and SQL databases and their use cases.'),
(11, 2020, 'Real-time Data Analysis', 'An introduction to real-time analytics.'),
(12, 2021, 'Analysis for Beginners', 'Simple examples of real time analytics.'),
(13, 2023, 'Data-Dictionary Design', 'Create and maintain effective data dictionaries.'),
(14, 2024, 'Scalable Performance', 'Designing for scalability.');
OPTIMIZE TABLE articles FULL;

Search for a Single Word

Search for rows with the word database in the body column.

SELECT *
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('body:database');
+----+------+---------------------------------+-----------------------------------------------------------------------------+
| id | year | title                           | body                                                                        |
+----+------+---------------------------------+-----------------------------------------------------------------------------+
| 7  | 2022 | Introduction to Database Design |	Fundamentals of designing a robust and scalable database.                   |
| 3  | 2020 | Database Optimization	      | Learn about various optimization techniques to improve database performance.|
+----+------+---------------------------------+-----------------------------------------------------------------------------+

Search for rows with the word Database in the title column or the phrase Business Intelligence in the body column.

SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('title:Database OR body:("Business Intelligence")');
+---------------------------------+
| title                           |     
+---------------------------------+
| Introduction to Database Design |
| Database Optimization           |
+---------------------------------+

Search for rows with the word SQL in the title column and the phrase "Data Security" in the title column.

SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('title:SQL AND title:("Data Security")');
+----------------------+
| title                |     
+----------------------+
| Data Security in SQL |
+----------------------+

Multiple MATCH Clauses

Use two MATCH clauses to search for rows with the word SQL in the title or the phrase Business Intelligence in the title, and that also have the word development in the body. In this example, the + indicates that the search term is required.

SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('title:SQL OR body:("Business Intelligence")')
AND MATCH (TABLE articles) AGAINST ('body:web+');
+------------------------+
| title                  |     
+------------------------+
| SQL in Web Development |
+------------------------+

Use +, *, and ?

Search for rows that have a word starting with Data and followed by an arbitrary number of characters in the title, or words like function followed by a single character (e.g. functions, but not functional) in the title. In this example, the + indicates that Data should appear at the beginning of the word.

SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('title:(+Data*) OR title:function?');
+---------------------------------+
| title                           |     
+---------------------------------+
| Introduction to Database Design |
| SQL and Data Analysis           |
| Database Optimization           |
| Data Security in SQL            |
| Data-Dictionary Design          |
| Real-time Data Analysis         |
+---------------------------------+

Create a Version 2 Full-Text Index over JSON

A full-text index can be created over a JSON column in the same manner it can be created over any other text-type column.

CREATE TABLE ft_records (
id INT UNSIGNED,
title VARCHAR(200),
records JSON,
SORT KEY(id),
FULLTEXT USING VERSION 2 rec_ft_index (title, records));

The full-text index is created over a JSON column by concatenating all leaf string values in the JSON as a multi-valued field. The engine variable fts2_position_increment_gap defines the logical spacing between concatenated leaf string values to prevent matching across different leaf string values. The default value is 100. In addition to having a field for each column in the full-text index, there will also be additional fields for each unique keypath in the JSON document.

INSERT INTO ft_records VALUES (
1,
'document',
'{
"k1": "cucumber",
"k2": ["dragonfruit", "eggplant"],
"k3": [
{"k3_1": "fig", "k3_2": "grape"},
{"k3_1": ["huckleberry", "iceberg lettuce"]},
"jicama"
]
}');
OPTIMIZE TABLE ft_records FULL;
SELECT title, records
FROM ft_records WHERE id = 1;
+----------+--------------------------------------------------------------------------------------------------------------------------------------------+ 
| title	   | records                                                                                                                                    |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------+
| document | {"k1":"cucumber","k2":["dragonfruit","eggplant"],"k3":[{"k3_1":"fig","k3_2":"grape"},{"k3_1":["huckleberry","iceberg lettuce"]},"jicama"]} | 
+----------+--------------------------------------------------------------------------------------------------------------------------------------------+

The JSON document for the records column for the row inserted above is shown below.

{
"title": "document",
"records": ["cucumber", "dragonfruit", "eggplant",
"fig", "grape", "huckleberry",
"iceberg lettuce", "jicama"],
"records$k1": "cucumber",
"records$k2": ["dragonfruit", "eggplant"],
"records$k3": "jicama",
"records$k3.k3_1": ["fig", "huckleberry", "iceberg lettuce"],
"records$k3.k3_2": "grape"
}

Here the records column is a multi-valued field of all leaf string values in JSON. This field can be queried like any other full-text indexed field. An exception is that matching will not occur over separate values unless the maximum number of positions allowed between matching phrases, or the “slop,” exceeds the value of fts2_position_increment_gap.

Along with records, other fields like records$k1 and records$k3.k3_1 are created to allow searching at each unique keypath present in the JSON document. The dollar sign ("$") is used as a delimiter between the SQL column name and the JSON keypath.

Query over a JSON column

Querying over the entire JSON column can be performed in the same way as with any other column that is part of the full-text index.

SELECT (MATCH (TABLE ft_records) AGAINST ('records:/.*cumber/')) AS cumber
FROM ft_records;
+----------+ 
| cumber   |
+----------+ 
| 1        | 
+----------+

Query over a JSON keypath

The following example shows how you can search for the string fig at the keypath k3.k3_1 in the records column using the field grouping syntax.

SELECT (MATCH (TABLE ft_records) AGAINST ('records$k3.k3_1:fig')) AS fig
FROM ft_records;
+------------------------+ 
| fig                    |
+------------------------+ 
| 0.13076457381248474    | 
+------------------------+

Query for Two Terms

Enclose the terms cucumber and raspberry in parentheses to do a boolean OR search for the terms (cucumber and raspberry) in the document. The document matches because the term cucumber appears in the document.

SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:(cucumber raspberry)');
+------+ 
| id   |
+------+ 
| 1    | 
+------+

In the example below, quotes are placed around the words cucumber and dragonfruit to search for the phrase "cucumber dragonfruit".

SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:"cucumber dragonfruit"');
Empty set (0.008 sec)

The phrase "cucumber dragonfruit" does not match the JSON document in the ft_records table because the words cucumber and dragonfruit belong to different leaf strings in that document.

When a proximity search query for the phrase "cucumber dragonfruit" with a slop of ~100 is used, the JSON document matches.

SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:"cucumber dragonfruit"~100');
+------+
| id   | 
+------+ 
| 1    | 
+------+

Slop indicates the maximum number of words allowed between words in a phrase for the phrase to be considered a match. That is, a slop of ~100 means if there are 100 words or less between cucumber and dragonfruit in the document, the document will be considered a match.

Further, a proximity search is only done on multi-valued fields such as JSON (or BSON) columns when the slop value is greater than or equal to fts2_position_increment_gap.

In this example slop is 100 and fts2_position_increment_gap=100 (100 is the default value of fts2_position_increment_gap). Since slop is equal to fts2_position_increment_gap matching across separate values occurs. And since the words cucumber and dragonfruit are within 100 words of each other in the document, a match is returned.

As described above, the ~ symbol is used for both proximity searches and fuzzy searches. If the ~ appears after a phrase (multiple words delimited by quotes "), this indicates that a proximity search with a slop value of the number appearing after the ~ should be performed. If the ~ appears after a single word, this indicates that an edit distance (Levenshtein) comparison should be performed on that word (also called a fuzzy search).

The following example shows a fuzzy search.

SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:dronfruit~2');
+------+
| id   |
+------+
|    1 |
+------+

A match is returned because dronfruit, which is intentionally misspelled, is within an edit distance of 2 of dragonfruit.

It is important to understand the difference between a phrase search and a boolean term search as it relates to JSON fields.

Words in quotes (like "cucumber dragonfruit")  are searched for as a single phrase in a single JSON field. Words that appear in parentheses (like (cucumber dragonfruit) are searched as if there is a boolean OR between the words. When terms are separated by logical operators, they may be matched in different JSON fields.

In the following example, cucumber and dragonfruit appear in fields k1 and k2 respectively, and thus a match occurs.

SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:(cucumber dragonfruit)');
+------+
| id   |
+------+
|    1 |
+------+

Example - ORDER BY … LIMIT

The query below will take advantage of the ORDER BY...LIMIT pushdown. In this query there is one MATCH function, named match_res, the columns in the WHERE and ORDER BY clauses are the same, and the sort order is DESC.

SELECT id, title,
MATCH (TABLE articles) AGAINST ('body:database') AS match_res
FROM articles
WHERE match_res
ORDER BY match_res DESC LIMIT 25;

Legacy (Version 1) SingleStore Process

SingleStore's legacy full-text search system uses a CLucene service which is embedded in the SingleStore database engine.

The following example illustrates how to create a table with a legacy version full-text search index and how to query from that table. The USING VERSION 1 syntax is optional.

CREATE TABLE articles (
id INT UNSIGNED,
year int UNSIGNED,
title VARCHAR(200),
body TEXT,
SORT KEY (id),
FULLTEXT USING VERSION 1 (title, body));
SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('database');

Refer to the MATCH page for more examples.

Index Repair

Full-text index creation failure is rare. However, if full-text index creation fails, you will receive the error ER_FTS_INDEX_NEEDS_REPAIR_ON_SEGMENT. The index can be repaired by running OPTIMIZE TABLE <tablename> FIX_FULLTEXT.

Last modified: April 11, 2025

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK