# Vector Indexing

SingleStore supports vector similarity scoring and you can use this to find the exact set of k nearest neighbors of a query vector, as described in [Working with Vector Data](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-vector-data.md). This is sometimes known as kNN search or exact kNN search.

If you have very large data sets and/or high concurrency requirements for a nearest-neighbor search, exact kNN search may not be efficient enough. For cases like this, SingleStore supports Approximate Nearest Neighbor (ANN) (A technique used in computational geometry and machine learning to  find the approximate nearest neighbors in high-dimensional spaces.) search based on a vector index. ANN search can support finding a set of k near neighbors very efficiently, potentially orders of magnitude faster than exact kNN search.

There is an accuracy vs. speed trade-off between exact kNN and ANN search. ANN will retrieve k near neighbors faster than kNN, but they may not be the exact set of k nearest neighbors. Hence the word "Approximate" in "Approximate Nearest Neighbor."

If you have nearest neighbor queries that do not have any selective filters associated with them, and large sets of vectors, use an ANN search. For example, if you need to find the top 10 matches out of one billion vectors with no filters on other columns in the data set and you want interactive response time, use an ANN search.

Some sample cases for using ANN search are:

* [Semantic Search of Text](https://www.singlestore.com/blog/ai-powered-semantic-search-in-singlestoredb-/) based on vectors from large language models (LLMs)
* [Retrieval-Augmented Generation (RAG)](https://research.ibm.com/blog/retrieval-augmented-generation-RAG) to enable focused, high-quality results for text generation based on LLMs within specific knowledge areas
* [Image Matching Based on Similarity of Vector Embeddings](https://www.singlestore.com/blog/image-matching-in-sql-with-singlestoredb/)

This topic explains how to create a vector index to enable ANN search, how to write queries that will use the index, and how to determine if the index is being used.

## Syntax

The syntax for a vector index has to adhere to the following rules:

* Vector indexes can only be built on columnstore tables.
* A vector index must be built on a single column.
* Column type is restricted to [VECTOR](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-types/vector-type.md)`(<N>, [F16|F32])`, where `<N>` is the number of dimensions. Currently, the only supported element types are `F16` and `F32`; `F32` is the default.

## Compare F16 and F32 Element Types

`F16` vectors offer these advantages over `F32` vectors:

* Use approximately half the storage space.
* Provide 35% faster kNN (k-Nearest Neighbor) search.
* Provide similar speed and recall for ANN (Approximate Nearest Neighbor) search.

kNN search, also called exact kNN search, searches an entire vector dataset using metrics such as [DOT\_PRODUCT](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/dot-product.md) or [EUCLIDEAN\_DISTANCE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/euclidean-distance.md) to find the top k vectors that are most similar to a query vector. In contrast, ANN search uses a [vector index](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/vector-indexing.md) to very efficiently find a set of k near neighbors (an approximate set of near neighbors).

For nearest neighbor search, SingleStore recommends using:

* `F16` in most cases as it saves both space and execution time over the default `F32`.
* `F32` only when the highest possible [recall](https://en.wikipedia.org/wiki/Precision_and_recall) is critical.

## Similarity Metrics

Two similarity functions are available for creating and searching vector indexes in SingleStore: `DOT_PRODUCT` and `EUCLIDEAN_DISTANCE`. 

`DOT_PRODUCT`: Calculates the cosine similarity metric when used with vectors normalized to length 1. Cosine similarity measures the similarity of vectors without taking into account the length of the vectors. The results of `DOT_PRODUCT` on vectors normalized to length 1 range from -1 to 1; values closer to 1 indicate that the vectors are similar, whereas values closer to -1 indicate that the vectors are less similar.

`EUCLIDEAN_DISTANCE`: Calculates the Euclidean distance between two vectors. Euclidean distance measures the distance between vectors taking into account the length of the vectors. The results of `EUCLIDEAN_DISTANCE` range from 0 to infinity; values closer to 0 indicate that the vectors are similar, values closer to infinity indicate that vectors are less similar.

## Remarks

* Search queries must use the same metric used in the creation of a vector index to use that index. That is, a query that uses `DOT_PRODUCT` can only use a vector index created with `DOT_PRODUCT`.
* Vectors must be normalized to length 1 before using the `DOT_PRODUCT` function to obtain the cosine similarity metric. SingleStore recommends vectors are normalized to length 1 before they are saved to the database. Many models that produce vector embeddings produce vectors normalized to length 1. In this case, it is not necessary to normalize the vectors again.
* Using `EUCLIDEAN_DISTANCE` over vectors which have been normalized to length 1 may result in poor recall because normalization causes the magnitude information present in the original vectors to be lost.

## Creating a Vector Index

A vector index can be created using the `CREATE TABLE` command:

```sql
CREATE TABLE <table_name> (<column_definitions>, VECTOR {INDEX|KEY } [<index_name>] (<column>) [INDEX_OPTIONS '<json>']);
```

Alternatively, a vector index may be added using the `ALTER TABLE` command:

```sql
ALTER TABLE <table_name> ADD VECTOR {INDEX|KEY } [<index_name>] (<column>) [INDEX_OPTIONS '<json>'];
```

Multiple vector indexes can be created on the same table or even on the same column.

A vector index may be dropped with the `ALTER TABLE` command:

```sql
ALTER TABLE <table_name> DROP INDEX <index_name>;

OR

DROP INDEX <index_name> ON <table_name>;
```

## Searching a Vector Index

A vector search query to find k ANNs can be done with standard SQL:

```sql
SELECT <columns>,
   DOT_PRODUCT | EUCLIDEAN_DISTANCE (<table_name>.v, @query_vector) AS distance
FROM <table_name> 
WHERE <pre-filters>
ORDER BY distance [USE {INDEX|KEY} ([<index_name>])]
[SEARCH_OPTIONS [=] '<json>'] [{DESC|ASC}]
LIMIT k;
```

Infix syntax is available.

* <\*> for `DOT_PRODUCT`
* <-> for `EUCLIDEAN_DISTANCE`

```sql
SELECT <columns>,
   <table_name>.v {<*>|<->} @query_vector AS distance
FROM <table_name> 
WHERE <pre-filters>
ORDER BY distance [USE {INDEX|KEY} ([<index_name>])]
[SEARCH_OPTIONS [=] '<json>'] [{DESC|ASC}]
LIMIT k;
```

*Important*:

* For `DOT_PRODUCT` (<\*>), higher values indicate higher vector similarity; use descending (`DESC`) order in the `ORDER BY` clause.
* For `EUCLIDEAN_DISTANCE`(<->), lower values indicate smaller distance between values; use ascending (`ASC`) the `ORDER BY` clause, which is the default.
* Search queries must use the same distance metric (`DOT_PRODUCT` or `EUCLIDEAN_DISTANCE`) as used in the vector index in order to utilize that vector index.
* The query vector (`@query_vector` in the syntax above) must be a runtime constant.

## Index Options

A JSON config string can be used with `INDEX_OPTIONS` to specify the vector index algorithm and its parameters to use. Search options can also be used as a JSON config string to `SEARCH_OPTIONS`.

Below are the generic `INDEX_OPTIONS` and `SEARCH_OPTIONS`. These generic index building parameters (`INDEX_OPTIONS`) and index searching parameters (`SEARCH_OPTIONS`) can be used with all index types.

* **Index building parameters**

  * **index\_type:** index type to use. Must be one of AUTO, FLAT, IVF\_FLAT, IVF\_PQ, IVF\_PQFS, HNSW\_FLAT, HNSW\_PQ. The default is AUTO. SingleStore recommends IVF\_PQFS and HNSW\_FLAT.
  * **metric\_type:** distance metric. Either `EUCLIDEAN_DISTANCE` or `DOT_PRODUCT`. The default is `DOT_PRODUCT`.
* **Search parameters**

  * **k:** number of rows outputted by vector index scan. k must be >= limit, where limit is from `ORDER BY … LIMIT` clause. The default is the limit.

## Index Hints

When there are multiple vector indices available or when you want to disable any available vector index, the index hint `[USE {INDEX|KEY} ([<index_name>])]` can be used to specify a specific index to use or to disable the use of an index.

Specify the exact index you want to use as `<index_name>` or omit the `<index_name>` part, i.e.,` USE {INDEX|KEY} ()` to disable ANN search. See [Index Hints Example](https://docs.singlestore.com/#section-idm4576836844555234184620897389.md) for an example.

## Index Types and Parameters

The following index types and parameters are supported.

All index search parameters can also be specified as index building parameters in `INDEX_OPTIONS`. When search parameters are specified in `INDEX_OPTIONS`, that value will be used as the default value for searches using that index so you do not need to specify those values each time you use that index.

> **📝 Note**: SingleStore recommends using the IVF\_PQFS and HNSW\_FLAT index types. IVF\_PQFS creates a smaller index and has lower index build time, while HNSW\_FLAT has lower search time and higher accuracy.Alternatively, for reduced memory usage with high recall, consider using [Matryoshka Representation Learning (MRL) embeddings with an IVF\_FLAT index and two-stage search](https://www.singlestore.com/blog/vector-search-with-matryoshka-embeddings/).

## AUTO

AUTO automatically selects the vector index algorithm and parameters. Specifying index and search options for AUTO is not allowed.

AUTO currently behaves the same as IVF\_PQFS. AUTO may be enhanced in the future with automatic index type selection, so the behavior of AUTO may change in future releases.

## FLAT

FLAT does not use an index. It keeps all the vectors in RAM and uses a full scan to find the nearest vector matches. In general, it is not needed because SingleStore will do full scan matching without an index. But it may be useful for purposes of comparison and experimentation with recall and performance or guaranteeing all vectors are in RAM.

## IVF\_FLAT

When using basic Inverted File Index (IVF), vectors are clustered into `nlist` clusters and search is done by searching only `nprobe` nearest clusters. The `nlist` parameter controls the number of centroids generated by k-means clustering. `nprobe` cannot be greater than `nlist`.

* **Index building parameters:**

  * **nlist:** number of inverted lists (number of clusters) created during index build. 1 <= nlist <= 65536. Default to 128.
  * **nprobe:** number of probes at query time. 1 <= nprobe <= 65536. Default to 8.
* **Search parameters:**

  * **nprobe**

## IVF\_PQ

Inverted file index with residual vectors PQ (A technique used for vector compression. It is very effective in compressing high-dimensional vectors for nearest neighbor search.)-encoded. Vectors are clustered into `nlist` clusters and search is done by searching only `nprobe` nearest clusters. The `nlist` parameter controls the number of centroids generated by k-means clustering. `nprobe` cannot be greater than `nlist`.

* **Index building parameters:**

  * **nlist:** number of inverted lists (number of clusters) created during index build. 1 <= nlist <= 65536. Default to 128.
  * **m:** number of subquantizers used in product quantization. Dimensions % m must equal 0. Default to 32.
  * **nbits:** number of bits per quantization index. 1 <= nbits <= 16. Default to 8.
  * **nprobe:** number of probes at query time. 1 <= nprobe <= 65536. Default to 8.
* **Search parameters:**

  * **nprobe**

## IVF\_PQFS

IVF\_PQ (A method used for approximate nearest neighbor search in large-scale datasets, particularly in high-dimensional spaces.) with 4-bit PQ fast scan. Vectors are clustered into `nlist` clusters and search is done by searching only `nprobe` nearest clusters. The `nlist` parameter controls the number of centroids generated by k-means clustering. `nprobe` cannot be greater than `nlist`.

* **Index building parameters:**

  * **nlist:** number of inverted lists (number of clusters) created during index build. 1 <= nlist <= 65536. Default to 128.
  * **m:** number of subquantizers used in product quantization. Dimensions % m must equal 0. Default to 32.
  * **nprobe:** number of probes at query time. 1 <= nprobe <= 65536. Default to 8.

* **Search parameters:**

  * **nprobe**

## HNSW\_FLAT

Basic Hierarchical Navigable Small World (HNSW) (An algorithm used for approximate nearest neighbor (ANN) search, particularly in high-dimensional spaces. This algorithm is well-suited for applications requiring fast approximate nearest neighbor queries, such as similarity search in large-scale datasets. HNSW is known for its efficiency in handling high-dimensional data, making it a good choice for machine learning, data mining, information retrieval, etc.) index. Builds a hierarchical proximity graph and search is done in a layered fashion.

* **Index building parameters:**

  * **M:** number of neighbors. 1 <= M <= 2048. Default to 30.
  * **efConstruction:** expansion factor at construction time. 1 <= efConstruction. Default to 40.
  * **ef:** expansion factor at search time. 1<= ef. Default to 16.
* **Search parameters:**

  * **ef**

## HNSW\_PQ

HNSW with vectors PQ-encoded.

* **Index building parameters:**

  * **M:** number of neighbors. 1 <= M <= 2048. Default to 30.
  * **efConstruction:** expansion factor at construction time. 1 <= efConstruction. Default to 40.
  * **m:** number of sub-quantizers. dimensions % m must equal 0. Default to 32.
  * **nbits:** number of bits per quantization index. 1 <= nbits <= 16. Default to 8.
  * **ef:** expansion factor at search time. 1 <= ef. Default to 16.
* **Search parameters:** ef: expansion factor at search time. 1 <= ef. Default to 16.

  * **ef**

## Vector Index Architecture

When a vector index is built, a table-level index is created for the vector column. This table-level index is composed of smaller pieces; one per-segment index per columnstore segment. The per-segment indexes are built in memory and stored on disk and remain on disk until they are used by a query.

Refer to [Managing Columnstore Segments](https://docs.singlestore.com/db/v9.1/create-a-database/columnstore/managing-columnstore-segments.md) for more information about segments.

## Vector Index Cache

When a vector search query is run, the relevant per-segment indexes used by the query are fetched from disk and stored in an in-memory cache. This allows the vector search queries to make efficient use of memory as only the vector index segments required for the query are read into the cache. Index segments are cached when used during a query but are not cached at the time of index creation.

The size of the vector index cache can be tuned with the `max_vector_index_cache_memory_percent` engine variable which controls the percent of total memory used for the vector cache. The engine variable `max_vector_index_cache_memory_mb` is a read-only variable which provides a convenient way to check the memory usage of the vector index cache in MB.

| `max_vector_index_cache_memory_percent` | Specifies the maximum percentage of available memory (`maximum_memory`) to be used by the vector index cache. Must be strictly between`0`and`100`.                                                                                                                                                           |
| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `max_vector_index_cache_memory_mb`      | The maximum amount of memory that can be used by the vector index cache (in MB). This variable is calculated based on the value of`max_vector_index_cache_memory_percent`. The maximum value of this variable is (`maximum_memory`- 800MB). This variable is read-only and is provided for user convenience. |

> **📝 Note**: For customers who have upgraded to version 9.0 from version 8.9 or prior versions, SingleStore recommends lowering the value of `max_vector_index_cache_memory_percent` to `40%` (the default value in version 9.0) to ensure there is enough memory available for other components of query processing.

Refer to [Sync Variables Lists](https://docs.singlestore.com/db/v9.1/reference/configuration-reference/engine-variables/list-of-engine-variables/#sync-variables-lists.md) for default values.

## Remarks

* Since the vector index cache is of limited size, one or more per-segment indexes that are not in use may be evicted from the cache when another index is loaded into the cache. If there are not enough evictable per-segment indexes, the engine reports an out of memory (OOM) error message.
* If the value of `max_vector_index_cache_memory_percent` is modified in a way that reduces the size of the vector index cache, the following occurs:

  * The resize succeeds if the new percentage is a legal value.
  * The next time the cache is used, for example when a query is run that uses a vector index, the engine evicts per-segment indexes that are not in use to reduce the size of the cache until the cache is smaller than the new limit.
  * If there are not enough evictable per-segment indexes, the engine reports OOM.
* Refer to [Memory Errors](https://docs.singlestore.com/db/v9.1/reference/troubleshooting-reference/memory-errors.md) for more information.

## Vector Index Merger

The vector index merger combines per-segment vector indexes to create a cross-segment vector index. Cross-segment vector indexes improve the performance of vector search queries by reducing the number of indexes that must be examined.

The vector index merger automatically runs in the background to continuously improve search performance. The vector index merger can be run manually by using the following command.

```sql
OPTIMIZE TABLE <table_name> VECTOR_INDEX <index_name>;
```

In addition, the vector index merger is also run when the `OPTIMIZE TABLE ... FULL` command shown below is run.

```sql
OPTIMIZE TABLE <table_name> FULL;
```

## Vector Index Fallback to Full Table Scan

Queries that use Approximate Nearest Neighbor (ANN) indexed vector search require a `LIMIT` clause to specify the number of results to be returned. When such queries contain highly-selective filters in the `WHERE` clause, these filters may reduce the number of results returned because they are applied after the indexed search.

If the engine detects that an indexed vector search followed by the filters yields fewer than `LIMIT` rows, the engine falls back to a full table scan to ensure sufficient rows are produced. This fall back ensures that the specified number of results (`LIMIT`) is returned, provided there are at least `LIMIT` valid results in the dataset.

The fallback mechanism can be disabled by setting the engine variable `vector_index_fallback_non_index_scan` to `false`.

## Tracking Vector Index Memory Use

Use the [VECTOR\_INDEX](https://docs.singlestore.com/db/v9.1/reference/information-schema-reference/query-performance-workload-management-and-statistics/vector-index.md) information schema view and the [SHOW STATUS EXTENDED](https://docs.singlestore.com/db/v9.1/reference/sql-reference/show-commands/show-status-extended.md) command to track vector index memory use.

The `VECTOR_INDEX` information schema view shows the amount of memory and disk used by each vector index. The memory usage information includes memory used by per-segment vector indexes that have been loaded into the vector index cache by [vector search](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-vector-data.md) or [Vector Range Search](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/vector-range-search.md).

Example output from the `VECTOR_INDEX` information schema view is shown below.

```sql
SELECT * FROM INFORMATION_SCHEMA.VECTOR_INDEX;

```

```output

+---------------+------------+---------+------------+-------------------+------------------------+-----------------+----------------------+
| DATABASE_NAME | TABLE_NAME | NODE_ID | INDEX_TYPE | MEMORY_SIZE_BYTES | INDEX_CHUNKS_IN_MEMORY | DISK_SIZE_BYTES | INDEX_CHUNKS_ON_DISK |
+---------------+------------+---------+------------+-------------------+------------------------+-----------------+----------------------+
| test          | ann_test   |       2 | HnswFlat   |         138460664 |                      1 |       276962270 |                    3 |
| test          | ann_test   |       2 | IvfPqfs    |          10488248 |                      3 |         9557524 |                    3 |
+---------------+------------+---------+------------+-------------------+------------------------+-----------------+----------------------+

```

In this output, there are two indexes on the table `ann_test`. A HNSW\_FLAT index and an IVF\_PQFS index. The HNSW\_FLAT index is partially loaded in memory, with 1 of 3 segments in memory, and occupies 138460664 bytes of memory. The IVF\_PQFS index is fully loaded in memory, 3 of 3 segments are loaded in memory and it occupies 10488248 bytes of memory.

The total memory used by all vector indexes can be viewed by using the `SHOW STATUS EXTENDED` command as shown below.

```sql
SHOW STATUS EXTENDED LIKE 'alloc_vector_index' 

```

```output

+--------------------+-------------------+
| Variable_name      | Value             |
+--------------------+-------------------+
| alloc_vector_index | 5.942 (+5.942) MB |
+--------------------+-------------------+
```

The results of `SHOW STATUS EXTENDED LIKE 'alloc_vector_index'` include the memory used by all per-segment vector indexes that are being used by vector search or vector range search.&#x20;

The command `SHOW STATUS EXTENDED LIKE 'alloc_vector_index'` does not output any rows if the memory used by vector indexes is zero.

**Note**: For indexes created in versions prior to 9.0, the fields `INDEX_TYPE` and `DISK_SIZE_BYTES` will display `NULL`. These fields are populated from metadata columns introduced in version 9.0, which are filled during index creation, thus the values for these columns are not known.

## Output Format for Examples

Vectors can be output in JSON or binary format. Use JSON format for examples and for output readability. For production, use the default binary for efficiency.

Use the following command to output vectors in JSON format.

```sql
SET vector_type_project_format = JSON;
```

Use the following command to set the output format back to binary.

```sql
SET vector_type_project_format = BINARY;
```

## Example 1

Create a table with a `VECTOR` attribute and then create an `IVF_PQFS` index on the vector column with index building parameter `nlist` set to 1024 and search parameter `nprobe` set to 20.

```sql
CREATE TABLE vect (k int, v VECTOR(2) NOT NULL);

INSERT INTO vect VALUES
      (1, '[-10, 1]'),
      (2, '[10, 2]'),
      (3, '[20, 3]');

ALTER TABLE vect ADD VECTOR INDEX (v) INDEX_OPTIONS
 '{"index_type":"IVF_PQFS", "nlist":1024, "nprobe":20}';
```

Note that the search parameter `nprobe` is used in this index creation command. All search parameters can be specified in `INDEX_OPTIONS` during index creation. The value provided for the search parameter (`nprobe` in this example) in index creation will be used as the default value for that search parameter for that index.

Optimize the table to ensure all results are indexed.

```sql
OPTIMIZE TABLE vect FLUSH;
```

Searching the nearest neighbor can be done with the commands below.

The `@query_vec` variable is cast to a `VECTOR` to ensure that `@query_vec` is a valid `VECTOR` and to improve performance.

```sql

SET vector_type_project_format = JSON;  /* to make vector output readable */

SET @query_vec = ('[9,0]'):> VECTOR(2);

/* run the SET commands before this query */
SELECT k, v, v <*> @query_vec AS score
FROM vect
ORDER BY score DESC LIMIT 1;


```

```output

+------+--------+------------------+
| k    | v      | score            |
+------+--------+------------------+
|    3 | [20,3] |              180 |
+------+--------+------------------+
```

You can increase `k`, as shown in the example below, to increase the number of rows output by the vector index scan. Increasing `k` will increase the search space and likely increase the recall, but will also increase the execution time of the query.

```sql
SET vector_type_project_format = JSON;  /* to make vector output readable */

SELECT k, v, v <*> '[9, 0]' AS score
FROM vect   
ORDER BY score SEARCH_OPTIONS '{"k" : 30 }' DESC   
LIMIT 3;


```

```output

+------+---------+-------------------------------+
| k    | v       | score                         |
+------+---------+-------------------------------+
|    3 | [20,3]  |                           180 |
|    2 | [10,2]  |                            90 |
|    1 | [-10,1] |                           -90 |
+------+---------+-------------------------------+
```

## Example 2

Below is a small, self-contained example. Real vectors for AI use cases such as retrieval-augmented generation (RAG) with LLMs would normally have a much higher dimensionality (e.g., 64 to a few thousand). Two-dimensional vectors are used to keep it simple.

Create and use a new database:

```sql
CREATE DATABASE ann_vectors; 

USE ann_vectors;
```

Set the SQL Mode:

```sql
SET sql_mode = pipes_as_concat;
```

Create a table:

```sql
CREATE TABLE ann_test(id INT, v VECTOR(2) NOT NULL, SHARD KEY(id), KEY (id));
```

Use the following SQL script to generate data:

```sql
DELIMITER //
DO
DECLARE s INT = 1*1024*1024;
DECLARE c BIGINT;
BEGIN
INSERT ann_test VALUES(1,"[0,0]");
LOOP
    INSERT INTO ann_test
    SELECT id+(SELECT MAX(id) FROM ann_test),
               "[" || s*rand() || "," || s*rand() || "]"
    FROM ann_test;
    SELECT COUNT(*) INTO c FROM ann_test;
    IF c >= s then
      EXIT;
    END IF;
END LOOP;
END
//
DELIMITER ;

```

Add a vector index to the table:

```sql
ALTER TABLE ann_test ADD VECTOR INDEX(v)
   INDEX_OPTIONS '{"index_type":"IVF_PQFS", "nlist":1024,
   "metric_type":"EUCLIDEAN_DISTANCE"}';
```

Find a query vector as the vector for row 1000:

```sql
SET @qv = (SELECT v FROM ann_test WHERE id = 1000);
```

Find top five closest matches to the query vector:

```sql
SELECT id, v, v <-> @qv AS score
FROM ann_test
ORDER BY score
LIMIT 5;
```

## Verify a Vector Index is Being Used

It is important to verify that your query is using a vector index. A vector index can be used in the following conditions:

* The metric in the query (dot product vs. euclidean distance) matches the metric used to create the index.
* The order in the` ORDER BY` clause (`DESC` vs. `ASC`) in the search query matches the metric.

To get a detailed view of query plan for vector similarity queries, use `EXPLAIN`. The contents of the `EXPLAIN` plan for the previous example is shown below. The ColumnStoreFilter operator does the indexed ANN search.

```sql
EXPLAIN SELECT id, v, v <-> @qv AS score
FROM ann_test
ORDER BY score
LIMIT 5;


```

```output

+----------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                      |
+----------------------------------------------------------------------------------------------+
| Project [remote_0.id, remote_0.v, remote_0.score]                                            |              
| TopSort limit:5 [remote_0.score]                                                             |
| Gather partitions:all alias:remote_0 parallelism_level:segment                               |
| Project [ann_test.id, ann_test.v, EUCLIDEAN_DISTANCE(ann_test.v,@qv) AS score]               |
| TopSort limit:5 [EUCLIDEAN_DISTANCE(ann_test.v,@qv)]                                         |
| ColumnStoreFilter [INTERNAL_VECTOR_SEARCH(0, @qv, 5, '') index]                              |
| ColumnStoreScan ann_vectors.ann_test, SORT KEY __UNORDERED () table_type:sharded_columnstore |
+----------------------------------------------------------------------------------------------+
```

The ColumnStoreFilter line of the `EXPLAIN` output above shows `INTERNAL_VECTOR_SEARCH(...)` which indicates that the vector index is being used. If `INTERNAL_VECTOR_SEARCH` is not in the `ColumnStoreFilter` line, the vector index is not being used.

## Index Hints Example

The query below will specify the use of the index with key name `v`. If that index is not available, the query will return an error. See [SHOW INDEXES](https://docs.singlestore.com/db/v9.1/reference/sql-reference/show-commands/show-indexes.md) for information on how to list the indexes for a table.

Use the `vect` table from [Example 1](https://docs.singlestore.com/#section-idm4540316264052834089839334376.md) above for this query and the following one.

```sql
SELECT k, v <*> ('[9, 0]' :> VECTOR(2)) AS score
FROM vect
ORDER BY score USE INDEX (v) DESC
LIMIT 2;

```

The query below, which contains a `USE INDEX` clause without an index name, will disable the use of the vector (ANN) indexes.

```sql
SELECT k, v <*> ('[9, 0]' :> VECTOR(2)) AS score
FROM vect
ORDER BY score USE INDEX () DESC
LIMIT 2;
```

The query below specifies that the index `v` should be used and specifies a search option for the parameter `k`. The `USE INDEX` clause and the `SEARCH_OPTIONS` clause must appear immediately after the `ORDER BY` clause and `USE INDEX` must appear before `SEARCH_OPTIONS`. 

```sql
SELECT k, v <*> ('[9, 0]' :> VECTOR(2)) AS score
FROM vect
ORDER BY score 
USE INDEX (v) SEARCH_OPTIONS '{"k":30}' DESC
LIMIT 2;
```

As described above, `EXPLAIN` can be used to verify if the query does or does not use an index.

> **📝 Note**: Recall that `ORDER BY` should be in descending order (`DESC`) when you are using `DOT_PRODUCT` and in ascending order (`ASC`) when you are using `EUCLIDEAN_DISTANCE`.

## When to use an ANN Index to Support Vector Search

In general, you should use ANN search for large sets of vectors when you do not have selective filters on other columns in the query. In the example above, if there are billions of comments, having an index to find the similarity scores for the query vector is very useful and will provide a significant performance improvement.

On the other hand, if you have a smaller set of vectors, say less than a few million, and selective filters in your queries on other non-vector columns, you're often better off not using ANN indexes. This is true for both vector search and KNN search. You can instead rely on SingleStore's other query processing capabilities and you will get good performance. This will also be less complex because you won't need to think about what index to create, or the impact the index may have on recall, load time and disk and memory usage.

## Related Topics

* [Vector Type](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-types/vector-type.md)
* [Working with Vector Data](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-vector-data.md)
* [Tuning Vector Indexes and Queries](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/tuning-vector-indexes-and-queries.md)
* [DOT\_PRODUCT](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/dot-product.md)
* [EUCLIDEAN\_DISTANCE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/euclidean-distance.md)

***

Modified at: May 18, 2026

Source: [/db/v9.1/reference/sql-reference/vector-functions/vector-indexing/](https://docs.singlestore.com/db/v9.1/reference/sql-reference/vector-functions/vector-indexing/)

(An index of the documentation is available at /llms.txt)
