Vector Range Search

Vector range search allows users to query for vectors which have a similarity score beyond a specified threshold. Vector range search can be a useful alternative to standard top-K vector search when you are looking for all vectors with a suitably high score, and may not know a good value of K to try. Both vector top-K and range searches can use vector ANN indexes to give results faster.

As an example, consider a table, comments, with the following definition:

CREATE TABLE comments(id INT,   
comment TEXT,   
comment_embedding VECTOR(4) not null,   
category VARCHAR(256));

The following query shows an example of a vector range search on the comments table using a WHERE clause to filter for embeddings with a score > 0.7.

The @query_vec variable is cast to a VECTOR and then to a BLOB to ensure that @query_vec is a valid VECTOR and to improve performance.

SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]':>VECTOR(4):>BLOB);
SELECT id, comment, category,
comment_embedding <*> @query_vec AS score
FROM comments
WHERE score > 0.7;

Vector range search queries can take advantage of a vector index if a vector index is built on the comment_embedding column of the comments table.

Refer to Vector Indexing, Vector Type, Working with Vector Data, Hybrid Search - Re-ranking and Blending Searches , and Tuning Vector Indexes and Queries, for additional information on vector search in SingleStore.

Index Options, Hints, and Parameters

The index options, hints, and parameters for vector range search are largely the same as those for indexed vector search (Vector Indexing), with the following exceptions:

  • Use RANGE_SEARCH_OPTIONS instead of SEARCH_OPTIONS to specify search options in queries.

  • The search parameter k is not supported.

  • The search parameter nprobe is not supported when using an IVF_PQFS index.

Create Table and Indexes

The following table, index, and data are used in the examples.

The statement below creates a comments table which represents a set of comments to be searched. The table includes the comments as TEXT and embeddings of those comments as a VECTOR with 4 elements. This example demonstrates a vector range search query over this table. Refer to Working with Vector Data for information about vector embeddings.

CREATE TABLE comments(id INT,   
comment TEXT,   
comment_embedding VECTOR(4) not null,   
category VARCHAR(256));

Next populate the table. The vectors in the following insert statement are represented as a JSON array of numbers for readability. Fictional, low-dimension vectors are used here to keep the example easier to read.

INSERT INTO comments VALUES       
(1, "The cafeteria in building 35 has a great salad bar",    
'[0.45, 0.55, 0.495, 0.5]',
    "Food"),
(2, "I love the taco bar in the B16 cafeteria.",     
   '[0.01111, 0.01111, 0.1, 0.999]',
    "Food"),
(3, "The B24 restaurant salad bar is quite good.",    
    '[0.1, 0.8, 0.2, 0.555]',
"Food");

Next, add a vector index of type IVF_PQFS to the comment_embedding column of the comments table.

ALTER TABLE comments ADD VECTOR INDEX ivf_dp(comment_embedding)
INDEX_OPTIONS'{"index_type":"IVF_PQFS"}';

Optionally optimize the table to enhance performance.

OPTIMIZE TABLE comments FULL;

The following example demonstrates the use of vector range search.

Suppose that the user likes salad and so the query is "restaurants with good salad." A vector embedding API can be called to obtain a vector for that phrase. Assume the vector returned is: '[0.44, 0.554, 0.34, 0.62]'.

The following query first creates a query vector (@query_vec) using this vector. Next, the vector range search query uses similarity search to look for vectors in the comments table which are similar to the query vector. Finally, the query returns vectors in the comments table which have a similarity score with the query vector of > 0.7, using DOT_PRODUCT (<*>). The ORDER BY clause is included to ensure that the higher scores appear at the top of the results.

EUCLIDEAN_DISTANCE (<->) is also available for use with vector range search. When using the dot product metric, the query must be written to find scores above a threshold; when using the Euclidean distance metric, the query must be written to find scores below a threshold.

SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]':>VECTOR(4):>BLOB);
SELECT id, comment, category,
comment_embedding <*> @query_vec AS score
FROM comments
WHERE score > 0.7
ORDER BY score DESC;
+------+----------------------------------------------------+----------+--------------------+
| id   | comment                                            | category | score              |
+------+----------------------------------------------------+----------+--------------------+
|    1 | The cafeteria in building 35 has a great salad bar | Food     | 0.9810000061988831 |
|    3 | The B24 restaurant salad bar is quite good.        | Food     | 0.8993000388145447 |
+------+----------------------------------------------------+----------+--------------------+

If a vector index has been added to the comment_embedding column in the table, as was done in this example, the vector index will be used in this query.

The EXPLAIN or PROFILE plan for the query can be used to tell if the vector index is being used in the query. For example:

SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]':>VECTOR(4):>BLOB);
EXPLAIN
SELECT id, comment, category,
comment_embedding <*> @query_vec AS score
FROM comments
WHERE score > 0.7
ORDER BY score DESC;
+----------------------------------------------------------------------------+
| EXPLAIN                                                                    |
|                                                                            |
+----------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0 parallelism_level:segment             |
| Project [comments.id, comments.comment, comments.category,                 |
|    DOT_PRODUCT(comments.comment_embedding,@query_vec) AS score]            |
|                                                                            |
| ColumnStoreFilter [DOT_PRODUCT(comments.comment_embedding,@query_vec) >    |
|     (.7!:>double NULL) AND INTERNAL_VECTOR_RANGE_SEARCH(0, @query_vec, .7) |
|      index]                                                                |
|                                                                            |
| ColumnStoreScan dbname.comments, SORT KEY __UNORDERED ()                   |
|        table_type:sharded_columnstore                                      |
+----------------------------------------------------------------------------+

The INTERNAL_VECTOR_RANGE_SEARCH notation in the ColumnStoreFilter row indicates that the plan uses a vector index for range search.

Example 3 - Use Index Hints

The syntax [USE {INDEX|KEY} ([<index_name>])] can be used to specify a specific index to use when there are multiple vector indexes, or to disable the use of an index.

The query below specifies the use of the index named ivf_dp. If that index is not available, the query returns an error. See SHOW INDEXES for information on how to list the indexes for a table.

SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]' :>VECTOR(4):>BLOB);
SELECT id, comment, category,
comment_embedding <*> @query_vec AS score
FROM comments
WHERE score > 0.7 USE INDEX (ivf_dp);
+------+----------------------------------------------------+----------+--------------------+
| id   | comment                                            | category | score              |
+------+----------------------------------------------------+----------+--------------------+
|    1 | The cafeteria in building 35 has a great salad bar | Food     | 0.9810000061988831 |
|    3 | The B24 restaurant salad bar is quite good.        | Food     | 0.8993000388145447 |
+------+----------------------------------------------------+----------+--------------------+

The use of empty parentheses ( ) in an index hint disables the use of vector indexes as is shown in the following query.

SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]' :>VECTOR(4):>BLOB);
SELECT id, comment, category,
comment_embedding <*> @query_vec AS score
FROM comments
WHERE score > 0.7 USE INDEX ();
+------+----------------------------------------------------+----------+--------------------+
| id   | comment                                            | category | score              |
+------+----------------------------------------------------+----------+--------------------+
|    1 | The cafeteria in building 35 has a great salad bar | Food     | 0.9810000061988831 |
|    3 | The B24 restaurant salad bar is quite good.        | Food     | 0.8993000388145447 |
+------+----------------------------------------------------+----------+--------------------+

As described previously, EXPLAIN can be used to verify if a query does or does not use an index.

Example 4 - Use Index Building Parameters

The following query creates a second index, named ivf_euclid, and demonstrates use of the index_type, metric_type, and nlist index building parameters.

ALTER TABLE comments
ADD VECTOR INDEX ivf_euclid(comment_embedding)
INDEX_OPTIONS'
{"index_type":"IVF_PQFS",
"metric_type":"EUCLIDEAN_DISTANCE",
"nlist":3}';

Example 5 - Use Index Search Parameters

Add a HNSW_FLAT index to the comments table.

ALTER TABLE comments ADD VECTOR INDEX hnsw(comment_embedding)
INDEX_OPTIONS'{"index_type":"HNSW_FLAT"}';

The query below specifies that the hnsw index with the search parameter ef set to 120 should be used to evaluate the filter score > 0.7. The USE_INDEX clause and the RANGE_SEARCH_OPTIONS clause must appear immediately after the filter (score > 0.7 in this example) and USE_INDEX must appear before RANGE_SEARCH_OPTIONS. The parentheses in the example below are not necessary, but indicate that the USE_INDEX and RANGE_SEARCH_OPTIONS clauses are associated with the filter score > 0.7.

SELECT id, comment, category,
comment_embedding <*> @query_vec AS score
FROM comments
WHERE (score > 0.7 USE INDEX (hnsw)
RANGE_SEARCH_OPTIONS '{"ef" : 120 }');
+------+----------------------------------------------------+----------+--------------------+
| id   | comment                                            | category | score              |
+------+----------------------------------------------------+----------+--------------------+
|    1 | The cafeteria in building 35 has a great salad bar | Food     | 0.9810000061988831 |
|    3 | The B24 restaurant salad bar is quite good.        | Food     | 0.8993000388145447 |
+------+----------------------------------------------------+----------+--------------------+

Refer to Tuning Vector Indexes and Queries for more information on setting index building and search parameters.

Last modified: May 31, 2024

Was this article helpful?