Hybrid Search - Re-ranking and Blending Searches
On this page
Hybrid Search combines multiple search methods in one query.
Example 1 provides a simple example of how hybrid search can combine full-text search, vector search, and a SQL filter.
Example 1 - Hybrid Search of Comments
This example demonstrates a hybrid search over a comments table.
Consider a table of comments that contains:
-
id
– anINT
id -
comment
– aTEXT
field which stores the text of the comment -
comment_
– aembedding VECTOR
, which stores a vector embedding that captures the meaning of the comment as a vector, as described in Working with Vector Data. -
category
– aVARCHAR
which stores a category assigned to the comment
Create the comments
table and insert three rows into that table.
INSERT INTO comments VALUES(1, "The cafeteria in building 35 has a great salad bar",'[0.45, 0.55, 0.495, 0.5]',"Food"),(2, "I love the taco bar in the B16 cafeteria.",'[0.01111, 0.01111, 0.1, 0.999]',"Food"),(3, "The B24 restaurant salad bar is quite good.",'[0.1, 0.8, 0.2, 0.555]',"Food");
Add a full-text index to the table and run OPTIMIZE TABLE
to ensure the values above are indexed.
ALTER TABLE comments ADD FULLTEXT ft_comment(comment);OPTIMIZE TABLE comments FULL;
The SQL query below first creates a query vector (@query_
).
The @query_
variable is cast to a VECTOR
to ensure that @query_
is a valid VECTOR
and to improve performance.
To try out SQL queries, click the Playground icon to the right of the SQL listing.
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);SELECT id, SUBSTRING(comment,0,20) AS comment,FORMAT(@query_vec <*> comment_embedding,4) AS vec_score,FORMAT(MATCH(comment) AGAINST ("restaurant"),4) AS ft_score,FORMAT((vec_score + ft_score) / 2,4) AS combined_scoreFROM commentsWHERE category = "Food"ORDER BY combined_score DESCLIMIT 2;
+------+----------------------+-----------+----------+----------------+
| id | comment | vec_score | ft_score | combined_score |
+------+----------------------+-----------+----------+----------------+
| 3 | The B24 restaurant s | 0.8993 | 0.1151 | 0.5072 |
| 1 | The cafeteria in bui | 0.9810 | 0.0000 | 0.4905 |
+------+----------------------+-----------+----------+----------------+
In the results, the row with id 3 has the word "restaurant" in it, so it gets a boost from the full-text score in this query and is the top-ranked result.
Also note that in this query, the comment string and the scores have been truncated for readability of the result.
The availability of vector search, full-text search, and standard SQL operations in SingleStore allows you to combine them as shown above.
Example 2 - Hybrid Search of Wikipedia Articles
SingleStore provides full-text search (Working with Full-Text Search) and Vector Indexing which can be used to speed up a hybrid search.
SingleStore has created a dataset of Wikipedia articles.
The example below uses just the data for video games, which is approximately 41,000 vectors based on 1,800 articles from Wikipedia.
Use the following SQL commands to run this example or try out this example in a SingleStore Notebook: Hybrid Full-text and Vector Search.
First, create a table to hold the vectors and paragraphs.
CREATE TABLE vecs(id BIGINT(20),url TEXT DEFAULT NULL,paragraph TEXT DEFAULT NULL,v VECTOR(1536) NOT NULL,SHARD KEY(id), KEY(id) USING HASH);
Load data into this table using the following pipeline.
-- since the bucket is open, you can leave the credentials clause as-isCREATE OR REPLACE PIPELINE wiki_pipeline ASload data S3's3://wikipedia-video-game-data/video-game-embeddings(1).csv'config '{"region":"us-west-1"}'credentials '{"aws_access_key_id": "", "aws_secret_access_key": ""}'SKIP DUPLICATE KEY ERRORSINTO TABLE vecsFORMAT csvFIELDS TERMINATED BY ','ENCLOSED BY '"'LINES TERMINATED BY '\r\n';START PIPELINE wiki_pipeline FOREGROUND;
Run full-text and vector indexes on the table.
ALTER TABLE vecs ADD FULLTEXT ft_para(paragraph);ALTER TABLE vecs ADD VECTOR INDEX ivf_v(v)INDEX_OPTIONS '{"index_type":"IVF_FLAT"}';
Run the following command to optimize the index searches and to ensure the values above are indexed.
OPTIMIZE TABLE vecs FULL;
A hybrid search query that uses both full-text and vector indexes can be constructed this way:
-
Create a subquery that uses vector search
-
Create a subquery that uses keyword search for similar content
-
Join the two with a full outer join
-
Produce a final result that combines the vector search score and full-text search score, and re-ranks
The query below first creates a query vector (@v_
) and then follows this outline and does an indexed hybrid vector and full-text search.
/* Get the vector for the first paragraph about the Mario Kart Game.It's a good semantic query vector for Mario Kart. */SET @v_mario_kart = (SELECT v FROM vecsWHERE URL = "https://en.wikipedia.org/wiki/Super_Mario_Kart"ORDER BY id LIMIT 1);WITH fts AS (SELECT id, paragraph,MATCH(paragraph) AGAINST("mario kart") AS SCOREFROM vecsWHERE MATCH(paragraph) AGAINST("mario kart")ORDER BY SCORE descLIMIT 200),vs AS (SELECT id, paragraph, v <*> @v_mario_kart AS SCOREFROM vecsORDER BY score DESCLIMIT 200)SELECT vs.id, SUBSTRING(vs.paragraph,0,25),FORMAT(IFNULL(fts.score, 0) * .3+ IFNULL(vs.score, 0) * .7, 4) AS score,FORMAT(fts.score, 4) AS fts_s,FORMAT(vs.score, 4) AS vs_sFROM fts FULL OUTER JOIN vs ON fts.id = vs.idORDER BY score DESCLIMIT 5;
+------------------+------------------------------+--------+--------+--------+
| id | SUBSTRING(vs.paragraph,0,25) | score | fts_s | vs_s |
+------------------+------------------------------+--------+--------+--------+
| 1125899906848349 | Super Mario Kart is a kar | 0.9943 | 0.9811 | 1.0000 |
| 1125899906848372 | Nintendo re-released Supe | 0.9353 | 1.0000 | 0.9075 |
| 2251799813690085 | Mario Kart DS was the bes | 0.9239 | 1.0000 | 0.8912 |
| 1125899906848375 | Several sequels to Super | 0.9144 | 0.8918 | 0.9240 |
| 1125899906855047 | Mario Kart: Super Circuit | 0.9130 | 0.9521 | 0.8962 |
+------------------+------------------------------+--------+--------+--------+
The final ranked result (by score
) is different from the order for the full-text score (fts_
) or the vector similarity score (vs_
).
SQL is used here in a relatively compact way.
For more details on this example and performance results see our blog on ANN search.
Related Topics
Last modified: August 14, 2024