Working with Full-Text Search
On this page
Full-text search allows searching for words or phrases in a large body of text through an inverted index.fuzzy
and performed over the text types CHAR
, VARCHAR
, TEXT
, and LONGTEXT
.
Full-text indexes are only supported on columnstore tables.CREATE TABLE
query using the FULLTEXT index type.
CREATE TABLE <table_name> (FULLTEXT [<fts_index_name>] (<fts_col>))
Note
If an index name was not designated when creating the table, the full-text index key_
must be used when dropping the index.key_
is displayed when the SHOW INDEXES FROM <table_
command is executed.
Content in columns that are full-text indexed can be searched using the MATCH syntax.MATCH
clause applies to only one table.MATCH
clauses.
Note
New inserts and updates into columnstore tables may initially be stored in a hidden rowstore table before being flushed to a segment file.
In that case, the full-text index in the columnstore will be updated asynchronously for new inserts and updates.OPTIMIZE TABLE <table_
command.
Since an index is created for each segment file, the distribution of words within the segment may affect the score of full-text queries, especially when the segments have very few rows and the columns have very few words.
For more information on columnstore tables, see How the Columnstore Works.
Relevancy Score
The relevancy score of an expression in a MATCH
statement denotes the ranking of the expression based on the following factors:
-
Number of times an expression appears in a column.
More occurrences of an expression in the matched column(s) increases its relevancy score. -
Rarity of the expression.
Rare words have a higher relevancy score than commonly used words. -
The length of the column containing the expression.
A column with a short expression has a higher relevancy score than a column with a long expression.
Create a Version 2 Full-Text Search Index and Query Using MATCH
CREATE TABLE articles (id INT UNSIGNED,year int UNSIGNED,title VARCHAR(200),body TEXT,SORT KEY (id),FULLTEXT USING VERSION 2 art_ft_index (title, body));
INSERT INTO articles (id, year, title, body) VALUES(1, 2021, 'Introduction to SQL', 'SQL is a standard language for accessing and manipulating databases.'),(2, 2022, 'Advanced SQL Techniques', 'Explore advanced techniques and functions in SQL for better data manipulation.'),(3, 2020, 'Database Optimization', 'Learn about various optimization techniques to improve database performance.'),(4, 2023, 'SQL in Web Development', 'Discover how SQL is used in web development to interact with databases.'),(5, 2019, 'Data Security in SQL', 'An overview of best practices for securing data in SQL databases.'),(6, 2021, 'SQL and Data Analysis', 'Using SQL for effective data analysis and reporting.'),(7, 2022, 'Introduction to Database Design', 'Fundamentals of designing a robust and scalable database.'),(8, 2020, 'SQL Performance Tuning', 'Tips and techniques for tuning SQL queries for better performance.'),(9, 2023, 'Using SQL with Python', 'Integrating SQL with Python for data science and automation tasks.'),(10, 2019, 'NoSQL vs SQL', 'A comparison of NoSQL and SQL databases and their use cases.');OPTIMIZE TABLE articles FLUSH;
SELECT *FROM articlesWHERE MATCH (TABLE articles) AGAINST ('body:database');
+----+------+---------------------------------+-----------------------------------------------------------------------------+
| id | year | title | body |
+----+------+---------------------------------+-----------------------------------------------------------------------------+
| 7 | 2022 | Introduction to Database Design | Fundamentals of designing a robust and scalable database. |
| 3 | 2020 | Database Optimization | Learn about various optimization techniques to improve database performance.|
+----+------+---------------------------------+-----------------------------------------------------------------------------+
SELECT titleFROM articlesWHERE MATCH (TABLE articles) AGAINST ('title:Database OR body:("Business Intelligence")');
+---------------------------------+
| title |
+---------------------------------+
| Introduction to Database Design |
| Database Optimization |
+---------------------------------+
SELECT titleFROM articlesWHERE MATCH (TABLE articles) AGAINST ('title:SQL AND title:("Data Security")');
+----------------------+
| title |
+----------------------+
| Data Security in SQL |
+----------------------+
SELECT titleFROM articlesWHERE MATCH (TABLE articles) AGAINST ('title:SQL OR body:("Business Intelligence")')AND MATCH (TABLE articles) AGAINST ('body:web+');
+------------------------+
| title |
+------------------------+
| SQL in Web Development |
+------------------------+
SELECT titleFROM articlesWHERE MATCH (TABLE articles) AGAINST ('title:(+Data*) OR title:function?');
+---------------------------------+
| title |
+---------------------------------+
| Introduction to Database Design |
| SQL and Data Analysis |
| Database Optimization |
| Data Security in SQL |
+---------------------------------+
Create a Version 2 Full-Text Index over JSON
A full-text index can be created over a JSON column in the same manner it can be created over any other text-type column.
CREATE TABLE ft_records (id INT UNSIGNED,title VARCHAR(200),records JSON,SORT KEY(id),FULLTEXT USING VERSION 2 rec_ft_index (title, records));
The full-text index is created over a JSON column by concatenating all leaf string values in the JSON as a multi-valued field.fts2_
defines the logical spacing between concatenated leaf string values to prevent matching across different leaf string values.
INSERT INTO ft_records VALUES (1,'document','{"k1": "cucumber","k2": ["dragonfruit", "eggplant"],"k3": [{"k3_1": "fig", "k3_2": "grape"},{"k3_1": ["huckleberry", "iceberg lettuce"]},"jicama"]}');OPTIMIZE TABLE ft_records FLUSH;
SELECT title, recordsFROM ft_records WHERE id = 1;
+----------+--------------------------------------------------------------------------------------------------------------------------------------------+
| title | records |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------+
| document | {"k1":"cucumber","k2":["dragonfruit","eggplant"],"k3":[{"k3_1":"fig","k3_2":"grape"},{"k3_1":["huckleberry","iceberg lettuce"]},"jicama"]} |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------+
The JSON document for the records
column for the row inserted above is shown below.
{"title": "document","records": ["cucumber", "dragonfruit", "eggplant","fig", "grape", "huckleberry","iceberg lettuce", "jicama"],"records$k1": "cucumber","records$k2": ["dragonfruit", "eggplant"],"records$k3": "jicama","records$k3.k3_1": ["fig", "huckleberry", "iceberg lettuce"],"records$k3.k3_2": "grape"}
Here the records
column is a multi-valued field of all leaf string values in JSON.fts2_
.
Along with records
, other fields like records$k1
and records$k3.
are created allow searching at each unique keypath present in the JSON document.
Query over JSON columns
Querying over the entire JSON column can be performed in the same way as with any other column part of the full-text index.
SELECT (MATCH (TABLE ft_records) AGAINST ('records:/.*cumber/')) AS cumberFROM ft_records;
+----------+
| cumber |
+----------+
| 1 |
+----------+
To query over a specific JSON keypath, use a query similar to the following:
SELECT (MATCH (TABLE ft_records) AGAINST ('records$k3.k3_1:fig')) AS figFROM ft_records;
+------------------------+
| fig |
+------------------------+
| 0.13076457381248474 |
+------------------------+
The example above shows how you can search for the string fig
at the keypath k3.
in the records
column using the field grouping syntax.
SELECT idFROM ft_recordsWHERE MATCH (TABLE ft_records) AGAINST ('records:(iceberg lettuce)');
+------+
| id |
+------+
| 1 |
+------+
SELECT idFROM ft_recordsWHERE MATCH (TABLE ft_records) AGAINST ('records:"cucumber dragonfruit"');
Empty set (0.008 sec)
SELECT idFROM ft_recordsWHERE MATCH (TABLE ft_records) AGAINST ('records:(cucumber dragonfruit~100)');
+------+
| id |
+------+
| 1 |
+------+
In the above examples, you can see that the phrase query cucumber dragonfruit
does not match because they belong to different leaf strings.fts2_
.
Related Topics
-
Working with Vector Data- Allows for semantic searching which is searching based on meanings, not keywords.
-
Training: Full-Text Index and Search
Last modified: September 11, 2024