Working with Full-Text Search

Full-text search allows searching for words or phrases in a large body of text through an inverted index. The search can be exact or fuzzy and performed over the text types CHAR, VARCHAR, TEXT, and LONGTEXT.

Full-text indexes are only supported on columnstore tables. Also, they can only be enabled as part of a CREATE TABLE query using the FULLTEXT index type. This means full-text indexes cannot be dropped or altered after the table is created. If the table is dropped, then the index is deleted automatically.

CREATE TABLE <table_name> (FULLTEXT [<fts_index_name>] (<fts_col>))

Note

If an index name was not designated when creating the table, the full-text index key_name must be used when dropping the index. The full-text index key_name is displayed when the SHOW INDEXES FROM <table_name> command is executed.

Content in columns that are full-text indexed can be searched using the MATCH syntax. Each MATCH clause applies to only one table. To search against multiple tables, specify multiple MATCH clauses.

Note

New inserts and updates into columnstore tables may initially be stored in a hidden rowstore table before being flushed to a segment file. The affected segment is re-indexed when the background flusher runs.

In that case, the full-text index in the columnstore will be updated asynchronously for new inserts and updates. Inserts and updates from this rowstore table can be force-pushed  to the columnstore table by using the OPTIMIZE TABLE <table_name> FLUSH command.

Since an index is created for each segment file, the distribution of words within the segment may affect the score of full-text queries, especially when the segments have very few rows and the columns have very few words.

For more information on columnstore tables, see How the Columnstore Works.

Relevancy Score

The relevancy score of an expression in a MATCH statement denotes the ranking of the expression based on the following factors:

  • Number of times an expression appears in a column. More occurrences of an expression in the matched column(s) increases its relevancy score.

  • Rarity of the expression. Rare words have a higher relevancy score than commonly used words.

  • The length of the column containing the expression. A column with a short expression has a higher relevancy score than a column with a long expression.

Create a Version 2 Full-Text Search Index and Query Using MATCH

CREATE TABLE articles (
id INT UNSIGNED,
year int UNSIGNED,
title VARCHAR(200),
body TEXT,
SORT KEY (id),
FULLTEXT USING VERSION 2 art_ft_index (title, body));
INSERT INTO articles (id, year, title, body) VALUES
(1, 2021, 'Introduction to SQL', 'SQL is a standard language for accessing and manipulating databases.'),
(2, 2022, 'Advanced SQL Techniques', 'Explore advanced techniques and functions in SQL for better data manipulation.'),
(3, 2020, 'Database Optimization', 'Learn about various optimization techniques to improve database performance.'),
(4, 2023, 'SQL in Web Development', 'Discover how SQL is used in web development to interact with databases.'),
(5, 2019, 'Data Security in SQL', 'An overview of best practices for securing data in SQL databases.'),
(6, 2021, 'SQL and Data Analysis', 'Using SQL for effective data analysis and reporting.'),
(7, 2022, 'Introduction to Database Design', 'Fundamentals of designing a robust and scalable database.'),
(8, 2020, 'SQL Performance Tuning', 'Tips and techniques for tuning SQL queries for better performance.'),
(9, 2023, 'Using SQL with Python', 'Integrating SQL with Python for data science and automation tasks.'),
(10, 2019, 'NoSQL vs SQL', 'A comparison of NoSQL and SQL databases and their use cases.');
OPTIMIZE TABLE articles FLUSH;
SELECT *
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('body:database');
+----+------+---------------------------------+-----------------------------------------------------------------------------+
| id | year | title                           | body                                                                        |
+----+------+---------------------------------+-----------------------------------------------------------------------------+
| 7  | 2022 | Introduction to Database Design |	Fundamentals of designing a robust and scalable database.                   |
| 3  | 2020 | Database Optimization	      | Learn about various optimization techniques to improve database performance.|
+----+------+---------------------------------+-----------------------------------------------------------------------------+
SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('title:Database OR body:("Business Intelligence")');
+---------------------------------+
| title                           |     
+---------------------------------+
| Introduction to Database Design |
| Database Optimization           |
+---------------------------------+
SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('title:SQL AND title:("Data Security")');
+----------------------+
| title                |     
+----------------------+
| Data Security in SQL |
+----------------------+
SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('title:SQL OR body:("Business Intelligence")')
AND MATCH (TABLE articles) AGAINST ('body:web+');
+------------------------+
| title                  |     
+------------------------+
| SQL in Web Development |
+------------------------+
SELECT title
FROM articles
WHERE MATCH (TABLE articles) AGAINST ('title:(+Data*) OR title:function?');
+---------------------------------+
| title                           |     
+---------------------------------+
| Introduction to Database Design |
| SQL and Data Analysis           |
| Database Optimization           |
| Data Security in SQL            |
+---------------------------------+

Create a Version 2 Full-Text Index over JSON

A full-text index can be created over a JSON column in the same manner it can be created over any other text-type column.

CREATE TABLE ft_records (
id INT UNSIGNED,
title VARCHAR(200),
records JSON,
SORT KEY(id),
FULLTEXT USING VERSION 2 rec_ft_index (title, records));

The full-text index is created over a JSON column by concatenating all leaf string values in the JSON as a multi-valued field. The engine variable fts2_position_increment_gap defines the logical spacing between concatenated leaf string values to prevent matching across different leaf string values. The default value is 100. In addition to having a field for each column in the full-text index, there will also be additional fields for each unique keypath in the JSON document.

INSERT INTO ft_records VALUES (
1,
'document',
'{
"k1": "cucumber",
"k2": ["dragonfruit", "eggplant"],
"k3": [
{"k3_1": "fig", "k3_2": "grape"},
{"k3_1": ["huckleberry", "iceberg lettuce"]},
"jicama"
]
}');
OPTIMIZE TABLE ft_records FLUSH;
SELECT title, records
FROM ft_records WHERE id = 1;
+----------+--------------------------------------------------------------------------------------------------------------------------------------------+ 
| title	   | records                                                                                                                                    |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------+
| document | {"k1":"cucumber","k2":["dragonfruit","eggplant"],"k3":[{"k3_1":"fig","k3_2":"grape"},{"k3_1":["huckleberry","iceberg lettuce"]},"jicama"]} | 
+----------+--------------------------------------------------------------------------------------------------------------------------------------------+

The JSON document for the records column for the row inserted above is shown below.

{
"title": "document",
"records": ["cucumber", "dragonfruit", "eggplant",
"fig", "grape", "huckleberry",
"iceberg lettuce", "jicama"],
"records$k1": "cucumber",
"records$k2": ["dragonfruit", "eggplant"],
"records$k3": "jicama",
"records$k3.k3_1": ["fig", "huckleberry", "iceberg lettuce"],
"records$k3.k3_2": "grape"
}

Here the records column is a multi-valued field of all leaf string values in JSON. This field can be queried like any other full-text indexed field. An exception is that matching will not occur over separate values unless the maximum number of positions allowed between matching phrases, or the “slop,” exceeds the fts2_position_increment_gap.

Along with records, other fields like records$k1 and records$k3.k3_1 are created allow searching at each unique keypath present in the JSON document. The dollar sign ("$") is used as a delimiter between the SQL column name and the JSON keypath.

Query over JSON columns

Querying over the entire JSON column can be performed in the same way as with any other column part of the full-text index.

SELECT (MATCH (TABLE ft_records) AGAINST ('records:/.*cumber/')) AS cumber
FROM ft_records;
+----------+ 
| cumber   |
+----------+ 
| 1        | 
+----------+

To query over a specific JSON keypath, use a query similar to the following:

SELECT (MATCH (TABLE ft_records) AGAINST ('records$k3.k3_1:fig')) AS fig
FROM ft_records;
+------------------------+ 
| fig                    |
+------------------------+ 
| 0.13076457381248474    | 
+------------------------+

The example above shows how you can search for the string fig at the keypath k3.k3_1 in the records column using the field grouping syntax.

SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:(iceberg lettuce)');
+------+ 
| id   |
+------+ 
| 1    | 
+------+ 
SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:"cucumber dragonfruit"');
Empty set (0.008 sec)
SELECT id
FROM ft_records
WHERE MATCH (TABLE ft_records) AGAINST ('records:(cucumber dragonfruit~100)');
+------+
| id   | 
+------+ 
| 1    | 
+------+ 

In the above examples, you can see that the phrase query cucumber dragonfruit does not match because they belong to different leaf strings. However, when a slop (the maximum number of positions allowed between matching phrases) of 100 is added the row matches because the default fts2_position_increment_gap=100.

Last modified: August 22, 2024

Was this article helpful?