MATCH

For columnstore tables created with a FULLTEXT index, the text columns in that table can be searched by using the MATCH AGAINST syntax.

The result of the MATCH statement is a relevancy score generally greater than or equal to 0 which indicates the quality of the match. Higher scores indicate higher quality matches, and lower scores indicate lower quality matches.

Important

To include recent inserts/updates from the hidden rowstore table in with the results, run OPTIMIZE TABLE tbl_name FLUSH before running your query.

Refer to Working with Full-Text Search for more conceptual information on this feature.

Syntax

The full-text search version 2 MATCH syntax is:

MATCH (TABLE <table_name>) AGAINST (<expression>)

The legacy full-text search MATCH syntax is:

MATCH (<column1>,<column2>,...) AGAINST (<expression>)

The VERSION 2 refers to SingleStore's next generation full-text search process. The USING VERSION 2 syntax must be used in the CREATE TABLE command to utilize the full-text search version 2 process. Refer to the Working with Full-Text Search page for more information.

The columns specified in a MATCH clause must be from the same table in the legacy full-text search process. Specify multiple MATCH clauses to search against multiple tables using the version 2 process. Additionally, full-text search works best over English text and is case-insensitive.

Note

The legacy full-text search version is deprecated. SingleStore recommends using VERSION 2 for new development of applications that use full-text search.

VERSION 2 does not allow the column names of full-text columns to contain a $ character. An error will be generated if a full-text column name contains a $ character.

Operators

Full-text search version 2 supports operators listed on the Java Lucene full-text search string syntax page.

The AGAINST expression consists of a mix of text with zero or more of the following operators.

Operator

Description

(no operator)

When no operator is specified, the word is optional; however, the rows that contain it are rated higher.

+

A leading plus sign indicates that this word must be present in each row returned.

-

A leading minus sign indicates that this word must not be present in any of the rows that are returned. Note: The - operator acts only to exclude rows that are otherwise matched by other search terms.

NOT

The NOT operator behaves the same as the - operator. The symbol ! can be used in place of the word NOT. The NOT operator must be in all caps.

AND

The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND. The expression A AND B is equivalent to +A +B. The AND operator must be in all caps.

OR

The OR operator behaves the same as not having any operator between words. The symbol || can be used in place of the word OR. The OR operator must be in all caps.

()

Parentheses group words into subexpressions. Parenthesized groups can be nested.

*, ?

See the Wildcard support section below.

""

A phrase that is enclosed within double quote (") characters matches the words in the quotes as if it is a single word. If the phrase contains no words that are in the index, the result is empty. The words might not be in the index because of a combination of factors: if they do not exist in the text, are stopwords, or are shorter than the minimum length of indexed words.

~

The tilde symbol is used to support fuzzy searches. To do a fuzzy search, use the tilde symbol at the end of a single-word term. For example, to search for a term similar in spelling to roam use the fuzzy search: roam~.

Note

SingleStore supports only constant expressions (search filters) inside the AGAINST clause, for example, a regular expression like (Stock*) or a constant string like "DBC". You may use external application software, or stored procedures with dynamic SQL, to substitute in different AGAINST expressions at runtime.

Stopwords

Certain words are ignored by full-text search due to their commonality resulting in less relevant results. These are called stopwords. SingleStore’s default list of stopwords is as follows:

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, 
of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

Wildcard Support

Single and multiple character wildcard searches within single terms are supported (not within phrase queries).

To perform a single character wildcard search use the ? symbol and to perform a multiple character wildcard search use the * symbol.

The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for text or test you can use the search: te?t

Multiple character wildcard searches looks for zero or more characters. For example, to search for test, tests or tester, you can use the search: test* You can also use the wildcard searches in the middle of a term. te*t

Important

Neither ? or * are supported at the beginning of a term. For example, searching for "?ello" or "*ello" will generate an error.

Special Characters

Depending on the use case, FULLTEXT matching may not always be compatible with match expressions involving special characters. This is because the default tokenizer for SingleStore’s FULLTEXT removes special characters from the search. The current list special characters are:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

Relevancy Score

The relevancy score of an expression in a MATCH statement denotes the ranking of the expression based on the following factors:

  • Number of times an expression appears in a column. More occurrences of an expression in the matched column(s) increases its relevancy score.

  • Rarity of the expression. Rare words have a higher relevancy score than commonly used words.

  • The length of the column containing the expression. A column with a short expression has a higher relevancy score than a column with a long expression.

MATCH and BM25 Scoring

Full-text searches using MATCH calculate scores at the segment level, so MATCH scores are relative to other documents within the same segment. In contrast, full-text searches using BM25 calculate scores at the partition level, so BM25 scores are relative to other documents within a partition. Segments are smaller than, and contained within partitions; thus, searches using MATCH are more efficient than searches using BM25, but also may return lower-quality results.

For example, when using MATCH, two rows that contain identical text may receive different scores in the same query if those rows are stored in different segments. The BM25 search function uses term and collection statistics from all documents in a partition, ensuring that scores are computed consistently and accurately across the entire set of rows in the partition.

Searching with MATCH is more efficient. SingleStore recommends using MATCH when efficiency is important and accuracy of results across segments is less important.

Full-Text Search Version 2 Examples

Create a Table with a Full-Text Version 2 Index

This example creates a FULLTEXT index for both the title column and the body column. Either column can be queried separately using MATCH (TABLE <table_name>) AGAINST (<expression>), and the index on the column will be applied.

CREATE TABLE articles (
id INT UNSIGNED,
year int UNSIGNED,
title VARCHAR(200),
body TEXT,
SORT KEY (id),
FULLTEXT USING VERSION 2 art_ft_index (title, body));
INSERT INTO articles (id, year, title, body) VALUES
(1, 2021, 'Introduction to SQL', 'SQL is a standard language for accessing and manipulating databases.'),
(2, 2022, 'Advanced SQL Techniques', 'Explore advanced techniques and functions in SQL for better data manipulation.'),
(3, 2020, 'Database Optimization', 'Learn about various optimization techniques to improve database performance.'),
(4, 2023, 'SQL in Web Development', 'Discover how SQL is used in web development to interact with databases.'),
(5, 2019, 'Data Security in SQL', 'An overview of best practices for securing data in SQL databases.'),
(6, 2021, 'SQL and Data Analysis', 'Using SQL for effective data analysis and reporting.'),
(7, 2022, 'Introduction to Database Design', 'Fundamentals of designing a robust and scalable database.'),
(8, 2020, 'SQL Performance Tuning', 'Tips and techniques for tuning SQL queries for better performance.'),
(9, 2023, 'Using SQL with Python', 'Integrating SQL with Python for data science and automation tasks.'),
(10, 2019, 'NoSQL vs SQL', 'A comparison of NoSQL and SQL databases and their use cases.'),
(11, 2020, 'Real-time Data Analysis', 'An introduction to real-time analytics.'),
(12, 2021, 'Analysis for Beginners', 'Simple examples of real time analytics.'),
(13, 2023, 'Data-Dictionary Design', 'Create and maintain effective data dictionaries.'),
(14, 2024, 'Scalable Performance', 'Designing for scalability.');
OPTIMIZE TABLE articles FLUSH;

Search for a Term

SELECT *
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('body:database');
+----+------+---------------------------------+-----------------------------------------------------------------------------+
| id | year | title                           | body                                                                        |
+----+------+---------------------------------+-----------------------------------------------------------------------------+
| 7  | 2022 | Introduction to Database Design |	Fundamentals of designing a robust and scalable database.                   |
| 3  | 2020 | Database Optimization	      | Learn about various optimization techniques to improve database performance.|
+----+------+---------------------------------+-----------------------------------------------------------------------------+

Use MATCH Twice in One Query

SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('title:SQL OR body:("Business Intelligence")')
AND MATCH (TABLE articles) AGAINST ('body:web*');
+------------------------+
| title                  |     
+------------------------+
| SQL in Web Development |
+------------------------+

Use Boolean Operators and Wildcards

SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('title:(+Data*) OR title:function?');
+---------------------------------+
| title                           |     
+---------------------------------+
| Introduction to Database Design |
| SQL and Data Analysis           |
| Database Optimization           |
| Data Security in SQL            |
| Data-Dictionary Design          |
| Real-time Data Analysis         |
+---------------------------------+

Use a SQL Predicate

SELECT count(*)
FROM articles
WHERE year = 2021
AND MATCH (TABLE articles) AGAINST ('body:SQL');
+----------+
| count(*) |
+----------+
|        2 |
+----------+

Search for the Word 'database' across Two Columns

SELECT *
FROM articles
WHERE MATCH (TABLE articles)
AGAINST ('title:database OR body:database');
+------+------+---------------------------------+------------------------------------------------------------------------------+
| id   | year | title                           | body                                                                         |
+------+------+---------------------------------+------------------------------------------------------------------------------+
|    3 | 2020 | Database Optimization           | Learn about various optimization techniques to improve database performance. |
|    7 | 2022 | Introduction to Database Design | Fundamentals of designing a robust and scalable database.                    |
+------+------+---------------------------------+------------------------------------------------------------------------------+

Use a Single Boolean Operator

SELECT title
FROM articles
WHERE MATCH (TABLE articles)
AGAINST ('title:Database OR title:"Business Intelligence"');
+---------------------------------+
| title                           |
+---------------------------------+
| Database Optimization           |
| Introduction to Database Design |
+---------------------------------+

Use Multiple Boolean Operators

The following example returns the title of an article that contains either 'Database' or 'Business Intelligence' and the string 'real-time analytics' in the body. As '-' is a special character, the search will include results for both real time and real-time.

SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('(title:Database OR title:"Analysis") AND (body:"real-time analytics")');
+--------------------------+
| title                    |
+--------------------------+
| Analysis for Beginners   |
| Real-time Data Analysis  |
+--------------------------+

Use a Wildcard

This example uses the wildcard ’*’ to return all articles with titles starting with the string 'Intro' such as Introduction.

SELECT id, title
FROM articles
WHERE MATCH(TABLE articles) AGAINST ('title:Intro*');
+------+---------------------------------+
| id   | title                           |
+------+---------------------------------+
|   1  | Introduction to SQL             |
|   7  | Introduction to Database Design |
+------+---------------------------------+

Create a Relevance Score as an Output Column

SELECT id, title, MATCH(TABLE articles) AGAINST ('body:database') AS relevance
FROM articles
WHERE MATCH(TABLE articles) AGAINST('body:database');
+------+---------------------------------+---------------------+
| id   | title                           | relevance           |
+------+---------------------------------+---------------------+
|    3 | Database Optimization           |  0.3346227705478668 |
|    7 | Introduction to Database Design | 0.13076457381248474 |
+------+---------------------------------+---------------------+

Search for Rows with a Relevance Score Greater than a Specific Value

SELECT id, title, MATCH(TABLE articles) AGAINST ('body:database')
FROM articles
WHERE MATCH(TABLE articles) AGAINST ('body:database') > .8;
+------+---------------------------------+-------------------------------------------------+
| id   | title                           | MATCH(TABLE articles) AGAINST ('body:database') |
+------+---------------------------------+-------------------------------------------------+
|    7 | Introduction to Database Design |                              0.8195944428443909 |
+------+---------------------------------+-------------------------------------------------+

Exact values may differ in this small example depending on the database configuration.

UPDATE or DELETE Queries

The MATCH command can be used in UPDATE or DELETE queries. Run OPTIMIZE TABLE <table_name> FLUSH after an UPDATE or DELETE to make the results of the UPDATE or DELETE immediately available.

UPDATE articles_for_update SET title = concat(title,".DATABASE")
WHERE MATCH(TABLE articles_for_update) AGAINST ('body:database');
DELETE FROM articles_for_update
WHERE MATCH(TABLE articles_for_update) AGAINST ('body:database');

Full-Text Index over JSON Column

A full-text index can be created over a JSON column in the same manner it can be created over any other text-type column.

CREATE TABLE ft_records (
id INT UNSIGNED,
title VARCHAR(200),
records JSON,
SORT KEY(id),
FULLTEXT USING VERSION 2 rec_ft_index (title, records));
INSERT INTO ft_records VALUES (
1,
'document',
'{
"k1": "cucumber",
"k2": ["dragonfruit", "eggplant"],
"k3": [
{"k3_1": "fig", "k3_2": "grape"},
{"k3_1": ["huckleberry", "iceberg lettuce"]},
"jicama"
]
}');
OPTIMIZE TABLE ft_records FLUSH;

Querying over the entire JSON column can be performed in the same way as with any other column part of the full-text index.

SELECT (MATCH (TABLE ft_records) AGAINST ('records:/.*cumber/')) AS cumber
FROM ft_records;
+----------+ 
| cumber   |
+----------+ 
| 1        | 
+----------+ 

To query over a specific JSON keypath, use a query similar to the following:

SELECT (MATCH (TABLE ft_records) AGAINST ('records$k3.k3_1:fig')) AS fig
FROM ft_records;
+-------------------------+ 
| fig                     |
+-------------------------+ 
| 0.13076457381248474     | 
+-------------------------+ 

The example above shows how you can search for the string fig at the keypath k3.k3_1 in the records column when using the field grouping syntax.

SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:(iceberg lettuce)');
+------+ 
| id   |
+------+ 
| 1    | 
+------+ 
SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:"cucumber dragonfruit"');
Empty set (0.008 sec)
SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:(cucumber dragonfruit~100)');
+------+
| id   | 
+------+ 
| 1    | 
+------+ 

In the above examples, you can see that the phrase query cucumber dragonfruit does not match because they belong to different leaf strings. However, when a slop (the maximum number of positions allowed between matching phrases) of 100 is added the row matches because the default fts2_position_increment_gap=100.

Full-Text Legacy Version 1 Examples

Create a Table with a Full-Text Index

CREATE TABLE articles (
id INT UNSIGNED,
year int UNSIGNED,
title VARCHAR(200),
body TEXT,
SORT KEY (id),
FULLTEXT USING VERSION 1 (title, body));

The USING VERSION 1 syntax is optional.

Use a SQL Predicate

SELECT count(*)
FROM articles
WHERE year = 2017 AND MATCH (body) AGAINST ('memsql');

Search for the Word database across Two Columns

SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('database');

Use a Single Operator

SELECT title
FROM articles
WHERE MATCH (title) AGAINST ('Database OR "Business Intelligence"');

Use Multiple Operators

The following example returns the title of an article that contains either database or Business Intelligence and the string real-time analytics in the body. As '-' is a special character, results for both real time and real-time will be included.

SELECT title
FROM articles
WHERE MATCH (title) AGAINST ('Database OR "Business Intelligence"')
AND MATCH(body) AGAINST ("real-time analytics");

Use a Wildcard

This example uses the wildcard ’*’ to return all articles with titles starting with the word Journal such as Journalism, Journalist, Journals, and so on.

SELECT *
FROM articles
WHERE MATCH(title) AGAINST ('Journal*');

Create a Relevance Score as an Output Column

SELECT id, title, MATCH(body) AGAINST ('database') relevance
FROM articles
WHERE MATCH(body) AGAINST ('database');

Search with a Specific Relevance Score

SELECT id, title, MATCH(body) AGAINST ('database')
FROM articles
WHERE MATCH(body) AGAINST ('database') > .12;

UPDATE or DELETE Queries

UPDATE articles set name = concat(name,".DATABASE") where MATCH(body) AGAINST ('database');
DELETE from articles where MATCH(body) AGAINST ('database');

Last modified: October 2, 2024

Was this article helpful?