Hybrid Search - Re-ranking and Blending Searches
On this page
Hybrid Search combines multiple search methods in a single query.
Combining the syntactic (keyword-based) ranking from full-text text search and the semantic ranking from vector search is often important.
Combining full-text and vector search allows you to combine the benefits of fine-grained control provided by full-text search with semantic vector search.
Reciprocal Rank Fusion
A common approach to blending results from two or more different rankings (e.
The RRF formula for N
different ranked lists where r
is the rank of an item in list i
, is shown in the following.

In this formula, k
is a smoothing factor which smooths out the decline in weights.k=0
) RRF would give weight 1.k=60
, the item ranked 1 is weighted 1/61 ≈ 0.
You can further refine RRF by weighting each ranking differently.
Examples
It is possible, as shown in the following examples below, to combine full-text and vector search rankings with RRF in a SQL query.
Example 1 shows an indexed hybrid search over data taken from Wikipedia pages.
In these examples, CTEs, the ROW_
window function and general expressions are used to perform powerful hybrid search rankings.
Example 1 - Hybrid Search of Wikipedia Articles
SingleStore provides full-text search (Working with Full-Text Search) and Vector Indexing which can be used to speed up a hybrid search.
SingleStore has created a dataset of Wikipedia articles.
The following example uses just the data for video games, which is approximately 41,000 vectors based on 1,800 articles from Wikipedia.
Use the following SQL commands to run this example.
Create a table to hold the vectors and paragraphs.
CREATE TABLE vecs(id BIGINT(20),url TEXT DEFAULT NULL,paragraph TEXT DEFAULT NULL,v VECTOR(1536) NOT NULL,SHARD KEY(id), KEY(id) USING HASH);
Load data into this table using the following pipeline.
-- since the bucket is open, you can leave the credentials clause as-isCREATE OR REPLACE PIPELINE wiki_pipeline ASload data S3's3://wikipedia-video-game-data/video-game-embeddings(1).csv'config '{"region":"us-west-1"}'credentials '{"aws_access_key_id": "", "aws_secret_access_key": ""}'SKIP DUPLICATE KEY ERRORSINTO TABLE vecsFORMAT csvFIELDS TERMINATED BY ','ENCLOSED BY '"'LINES TERMINATED BY '\r\n';START PIPELINE wiki_pipeline FOREGROUND;
Add a version 2 full-text index and a vector index on the table.
ALTER TABLE vecs ADD FULLTEXT USING VERSION 2 ft_para(paragraph);ALTER TABLE vecs ADD VECTOR INDEX ivf_v(v)INDEX_OPTIONS '{"index_type":"IVF_FLAT"}';
Run the following command to optimize the index searches and to ensure the values above are indexed.
OPTIMIZE TABLE vecs FLUSH;
The following query searches for rows with information about Super Mario Kart, by using full-text search to search with paragraphs that match the string super mario kart
and that have embedding vectors (v
) that are similar to the embedding for the introduction to the Wikipedia Super Mario Kart page.
The variables are set prior to defining the query.
-
@ft_
to hold the full-text match condition.q -
@v_
to hold the vector embedding for the Mario Kart Wikipedia page.mario_ kart
At a high-level, the query is structured as follows:
-
Two Common Table Expressions (CTEs) use indexed full-text and vector search to produce ordered lists of rows that best meet the full-text and vector search criteria.
-
Two additional CTEs create ranked full-text and vector results.
-
A combined score is calculated using the RRF formula, and the top 5 results are returned.
SET @ft_q = 'paragraph:"super mario kart"';SET @v_mario_kart = (SELECT v FROM vecsWHERE URL = "https://en.wikipedia.org/wiki/Super_Mario_Kart"ORDER BY id LIMIT 1);WITH fts AS ( /* Find top full-text matches. */SELECTid,paragraph,MATCH(TABLE vecs) AGAINST(@ft_q) AS SCOREFROM vecsWHERE MATCH(TABLE vecs) AGAINST(@ft_q)ORDER BY SCORE descLIMIT 200),vs AS ( /* Find top vector search matches. <*> is dot product. */SELECTid,paragraph,v <*> @v_mario_kart AS SCOREFROM vecsORDER BY score DESCLIMIT 200),fts_ranked as (SELECT*,ROW_NUMBER() OVER (ORDER BY SCORE DESC) AS fts_rankFROM fts),vs_ranked as (SELECT*,ROW_NUMBER() OVER (ORDER BY SCORE DESC) as vs_rankFROM vs)SELECT*,0.7 * (1.0 / (NVL(fts_rank, 1000.0) + 60)) + 0.3 * (1.0 / (NVL(vs_rank, 1000.0) + 60)) as combined_scoreFROM fts_ranked f FULL OUTER JOIN vs_ranked v ON f.id = v.idORDER BY combined_score DESCLIMIT 5;
In more detail, this query uses the following structure:
-
The first two CTEs in the query (
fts
andvs
) create ordered lists of 200 rows that best match the full-text and vector search criteria.-
The full-text search block (
fts
) uses inverted index search based on Java Lucene. -
The vector search block (
vs
) uses approximate nearest neighbor (ANN) indexing.
-
-
The next two CTEs use the
ROW_
window function to create ranked lists (NUMBER fts_
andranked vs_
) of these 200 rows.ranked -
The ranked lists are created in separate CTEs to allow the initial CTEs to use indexed search.
-
If the
ROW_
calculation were pushed into the initial full-text and vector and search CTEs (NUMBER fts
andvs
), that would prevent those CTEs from using index search.
-
-
The final query block uses a
FULL OUTER JOIN
to ensure that rows for items that appear in one ranked list but not the other are included in the results.Standard INNER JOIN
would discard items that are not in both lists. -
The
SELECT
clause calculates thecombined_
using RRF and returns the top 5 results with highestscore combined_
.score
This query uses a smoothing factor of 60 and weights full-text results at 0.NULL
ranks — which can arise from the outer join — are given an artificially high rank of 1000 so they don't contribute meaningfully to the score.
Notice that values with high ranks rise to the top (as with the (2,1) rank pair), but a poor ranking on one sorted input doesn't prevent an item from participating in the results at a high level, e.
Example 2 - Hybrid Search of Comments
This example demonstrates a hybrid search over a comments table.
-
A semantic search using vector search to find comments that relate to a user's request.
-
Full-text search to prioritize comments with a specific keyword.
-
A SQL filter that restricts to the "Food" category.
Consider a table of comments that contains:
-
id
– anINT
id -
comment
– aTEXT
field which stores the text of the comment -
comment_
– aembedding VECTOR
, which stores a vector embedding that captures the meaning of the comment as a vector, as described in Working with Vector Data. -
category
– aVARCHAR
which stores a category assigned to the comment
Create the comments
table and insert three rows into that table.
CREATE TABLE comments(id INT,comment TEXT,comment_embedding VECTOR(4),category VARCHAR(256));
INSERT INTO comments VALUES(1, "The cafeteria in building 35 has a great salad bar",'[0.45, 0.55, 0.495, 0.5]',"Food"),(2, "I love the taco bar in the B16 cafeteria.",'[0.01111, 0.01111, 0.1, 0.999]',"Food"),(3, "The B24 restaurant salad bar is quite good.",'[0.1, 0.8, 0.2, 0.555]',"Food");
Add a full-text version 2 index to the table and run OPTIMIZE TABLE
to ensure the values above are indexed.
ALTER TABLE comments ADD FULLTEXT USING VERSION 2 ft_comment(comment);OPTIMIZE TABLE comments FLUSH;
The following SQL query searches for comments that match the word restaurant and are similar to a query vector ('[0.
) which is intended to represent an embedding of the user's request."Food"
.
The following variables are set prior to defining the query.
-
@query_
to hold the full-text match condition that the comment must include the word restaurant.comment -
@query_
to hold the vector embedding from the user's request.vec
Similar to Example 1, the query is structured as follows:
-
Two Common Table Expressions (CTEs) use indexed full-text and vector search to produce ordered lists of rows that meet the full-text and vector search criteria.
-
Both CTEs include only comments in the category
"Food"
.
-
-
Two additional CTEs create ranked full-text and vector results.
-
A combined score is calculated using the RRF formula, and the top 3 results are returned.
The @query_
variable is cast to a VECTOR
to ensure that @query_
is a valid VECTOR
and to improve performance.
SET @query_comment = 'comment:restaurant';SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);WITH fts AS ( /* Find top full-text matches. */SELECTid,comment,MATCH(TABLE comments) AGAINST (@query_comment) AS ft_scoreFROM commentsWHERE MATCH(TABLE comments) AGAINST(@query_comment)ORDER BY ft_score descLIMIT 200),vs AS ( /* Find top vector search matches. <*> is dot product. */SELECTid,comment,@query_vec <*> comment_embedding AS vec_scoreFROM commentsWHERE category = "Food"ORDER BY vec_score descLIMIT 200),fts_ranked as (SELECT*,ROW_NUMBER() OVER (ORDER BY ft_score DESC) AS fts_rankFROM fts),vs_ranked as (SELECT*,ROW_NUMBER() OVER (ORDER BY vec_score DESC) as vs_rankFROM vs)SELECT*,0.7 * (1.0 / (NVL(fts_rank, 1000.0) + 60)) + 0.3 * (1.0 / (NVL(vs_rank, 1000.0) + 60)) AS combined_scoreFROM fts_ranked f FULL OUTER JOIN vs_ranked v ON f.id = v.idORDER BY combined_score DESCLIMIT 3;
*** 1. row ***
id: 3
comment: The B24 restaurant salad bar is quite good.
ft_score: 0.46706151962280273
fts_rank: 1
id: 3
comment: The B24 restaurant salad bar is quite good.
vec_score: 0.8993000388145447
vs_rank: 2
combined_score: 0.016314
*** 2. row ***
id: NULL
comment: NULL
ft_score: NULL
fts_rank: NULL
id: 1
comment: The cafeteria in building 35 has a great salad bar
vec_score: 0.9810000061988831
vs_rank: 1
combined_score: 0.005578
*** 3. row ***
id: NULL
comment: NULL
ft_score: NULL
fts_rank: NULL
id: 2
comment: I love the taco bar in the B16 cafeteria.
vec_score: 0.6644233465194702
vs_rank: 3
combined_score: 0.005422
In the results, the row with id = 3
has the word "restaurant" in it, so it gets a boost from the full-text score.id = 1
.id = 1
would be ranked higher than the comment with id = 3
.
Related Topics
Last modified: August 22, 2025