Important

The SingleStore 9.1 release candidate (RC) gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 9.0 is recommended for production workloads, which can later be upgraded to SingleStore 9.1.

Configuring Vector Indexes

Configuring SingleStore for high-performance vector search is important for achieving maximum performance. This page describes how to configure SingleStore for vector search, focusing on configuration for high queries per second, followed by adjustments for other types of workloads and suggested optimizations.

This document is intended for readers with an understanding of SingleStore parallel query execution. See the list of resources at the end of this document for information about the concepts in this document.

Configure for High Throughput / High QPS

For high-throughput workloads where the goal is to maximize Queries Per Second (QPS), the key configuration is to have fewer, bigger segments, and make the segment size even.

In addition, to get high queries per second (QPS), it helps to change the flexible parallelism settings to use only one thread per leaf node per query, which will use one thread per partition when there is one segment per partition.

Concept

Vector search performance is highly related to the number of index segments scanned. Hence, to optimize performance, queries should scan the fewest index segments which translates to maximizing index segment size.

Conceptually, to minimize the number of segments scanned by each query, the configuration should use one database partition on each leaf node, the minimum number of segments within that partition, and one thread on each leaf node scanning that partition.

To do so, the following configurations are needed:

Use large segments: Allow columnstore segments to be as large as possible (by setting internal_columnstore_max_uncompressed_blob_size and columnstore_segment_rows to their maximum possible values).
Use few partitions and eliminate subpartitions: Use one partition per leaf node, or two partitions if using a small, single-leaf or two-leaf system and eliminate subpartitions.
Evenly distribute data among partitions: Choose a shard key that will evenly distribute data among partitions.
Turn down flexible parallelism: Set flexible parallelism so that one thread will scan each partition. This setting reduces per-query overhead and helps performance when the majority of the query execution work is for index searches.

Recommended Configuration

Note: It is important to create indexes and set the configuration variables before loading data.

To obtain the behavior described above and minimize the number of segments scanned, the recommended configuration settings are as follows:

Use few partitions and eliminate subpartitions:
- CREATE DATABASE <dbname> PARTITIONS <number of leaf nodes> SUB_PARTITIONS 0
Use large segments:
- SET GLOBAL internal_columnstore_max_uncompressed_blob_size = 10737418240
  - 10,737,418,240 is the maximum possible value of internal_columnstore_max_uncompressed_blob_size
  - Caution: It can take up to 3 times the amount of memory specified in this variable to create a blob. If you get OOM errors, reduce this value or scale up your RAM (say with a 2x or 4x cloud scaled instance).
- SET GLOBAL columnstore_segment_rows = 10000000
  - 10,000,000 is the maximum value of columnstore_segment_rows.
Evenly distribute data:
- Select a shard key that will evenly distribute data across partitions.
Turn down flexible parallelism:
- SET query_parallelism_per_leaf_core = 0.01
  - With this configuration, the engine uses one thread per leaf node.
  - This is a session variable and will need to be set for each connection.

Verification and Optimization

Run the OPTIMIZE TABLE FLUSH command after the index is built to flush all index segments to disk.
After the initial configuration is set, verify, using the command below, that there are no or few small segments. If there are small segments, consider a change to the shard key or the number of partitions.
SQL
```
SELECT
   DATABASE_NAME,
   TABLE_NAME,
   ORDINAL AS PARTITION_ID,
   ROWS,
   MEMORY_USE
FROM INFORMATION_SCHEMA.TABLE_STATISTICS
WHERE TABLE_NAME = '<table_name>';
```

Data Distribution Notes

When tuning for QPS, do not increase the partition count if doing so shrinks the segment size.

However, when the data is very large, that is the number of rows is much greater than (number of partitions) * (segment size), there will be many segments in each partition. In this case, if you run the OPTIMIZE TABLE FULL command, the segments will have roughly even sizes except for the tail segment. Given that there are a lot of segments, having one small segment is not very significant in percentage terms. Having no tiny stray segments is more important for small data sets with just one or a few segments per partition.

Avoid compaction during data ingestion by running the OPTIMIZE TABLE FULL command after data has been loaded into the table.

Thus, if the data size is large enough so that there is more than one maximum-sized segment per partition, you can increase the partition count without effect on QPS. Increasing the partition count in this case will not reduce the number of segments.

Workload

The configuration above is designed for workloads in which:

Multiple queries are run in parallel.
- When flexible parallelism is turned down so that there is one thread per leaf node, a concern is that there are extra VCPUs which will not be used. When multiple queries are running in parallel, these extra VCPUs are used.
Queries do not have highly selective predicates.
- When queries do not have highly selective predicates, segment-elimination is not effective, therefore a 'small' number of 'large' segments is a reasonable configuration.

Additional Configurations and Workload Considerations

Segment Size Tradeoffs

The index configuration (segment size) needs to be matched to the workload.

In SingleStore, vector indexes are built per (columnstore) segment. Vector indexes span all the sub-partitions in a partition. Large segments typically improve search speed because the cost of searching a large segment is only slightly more than the cost of searching a small segment (search time increases logarithmically with segment size).

While large segment sizes are good for QPS and for search performance, smaller segments may be appropriate for workloads with updates or workloads with highly selective queries. Queries with highly selective WHERE filters may benefit from smaller segments because segment elimination is applied before vector index searches.

When indexes are updated regularly, when latency is important and high parallelism is useful, or when there are highly selective queries, appropriate segment size needs to be considered.

Segment Size Tradeoffs for Update Performance

There is a tradeoff between vector index search performance and index update rate. Large segments improve index search performance while small segments improve index update rate. Thus there is a tradeoff between search speed and update speed. This tradeoff can be managed by adjusting the segment size.

To increase the index update rate, consider reducing the segment size or increasing the number of partitions.

Segment Size Tradeoffs for Highly Selective Queries

Similarly, there is a tradeoff between vector index search performance and segment elimination. Large segments improve index search performance; small segments improve segment elimination.

Segment elimination is the process whereby the system avoids scanning segments that do not meet the filter in the query. With highly selective filters, segment elimination will result in much less data being scanned and can significantly improve performance. In this case, smaller segments mean less data is scanned (small segments are more likely to be able to be 'eliminated' and a smaller number of rows are read from the segments that pass segment elimination). Thus, smaller segments are good for performance for queries with highly selective filters that are not for vector search. An example of such a filter is a date range filter in the WHERE clause of a query.

That is, queries with highly selective predicates may benefit from an increased number of partitions and segments. The increased number of smaller partitions will enable more segment elimination.

Refer to Choosing a Columnstore Key for information on segment elimination.

Adjust the Number of Segments and Partitions

Segment size in SingleStore is managed through the variables:

internal_columnstore_max_uncompressed_blob_size (max value 10,737,418,240)
columnstore_segment_rows (max value 10,000,000)

As noted above, it's often best to set these to larger-than-default values (e.g. the maximum values) to improve vector and text search performance, especially if you don't have selective standard WHERE clause filters. You can also choose a smaller-than-default number of partitions in your CREATE DATABASE statement as recommended earlier.

Parallel Scan and Flexible Parallelism

SingleStore is a powerful parallel database which supports a spectrum of parallelism; different levels of parallelization are effective for different types of workloads. In general, to optimize for throughput or high queries per second and for workloads with many relatively small queries, flexible parallelism should be turned down. To optimize for response time, flexible parallelism should be turned up.

Flexible Parallelism

Flexible parallelism allows multiple threads to scan a single partition, increasing the parallelism (number of cores) used for a query. Flexible parallelism allows all cores to work on a particular query even if there are more cores than partitions. This decreases query latency. More specifically, flexible parallelism allows each core to scan a sub-partition, so multiple threads per query per leaf. This is great for certain workloads, but not for all workloads.

If you have multiple queries running concurrently, parallelism may not help. Multiple queries can use multiple cores, so there is less need for parallelism. However, if you have fewer queries running, parallelism can help improve performance. At a high level, if you have multiple concurrent queries, turn down the parallelism, but if you have few large queries running, turn up the parallelism. Flexible parallelism is managed with the engine variable query_parallelism_per_leaf_core.

Turning down parallelism by decreasing query_parallelism_per_leaf_core helps throughput by reducing overhead for starting and synchronizing threads. More importantly, turning down parallelism can also reduce total work by avoiding having multiple threads process the same segment, potentially each doing an index search independently, or doing ORDER BY ... LIMIT queries on more threads which could increase the total number of rows decoded and transmitted. This extra decoding and transmission can drive up the total CPU cycles needed to run a query.

Flexible Parallelism Configuration

When doing a parallel (non-indexed) scan:

If (flexible) parallelism is set to one (query_parallelism_per_leaf_core = 1), then there will be one thread per CPU (vCPU) core and each thread will scan different sub-partitions in parallel.
If (flexible) parallelism is set lower (query_parallelism_per_leaf_core < 1), then fewer threads will be used and each thread will scan multiple sub-partitions. Setting query_parallelism_per_leaf_core = .01 will give one thread per leaf node, which is ideal for high-throughput search workloads.

Configure for Large Data

For huge data, you may want more than one thread per leaf node to balance response time versus throughput. If throughput is still your top concern, you can still limit the threads per leaf node by setting query_parallelism_per_leaf_core = 0.01.

Configure for Response Time

To obtain the best response times for individual queries, use multiple threads per query. Set query_parallelism_per_leaf_core at its default value of 1.0 or some relatively high value such as 0.5 or 0.25. Increasing query parallelism increases parallelism at the cost of thread overhead. Thus, there is a tradeoff between increased parallelism and thread overhead which can be managed by setting query parallelism.

Benchmarking with Fixed Size Data

If your data is fixed size, for example, if you are benchmarking, follow this procedure to get data evenly distributed among partitions.

Calculate an even distribution of data among partitions and set columnstore_segment_rows to enforce that distribution.
- SET GLOBAL columnstore_segment_rows = ((#rows in data set)/(#partitions)) + buffer
  - Suggest value of ~1000 for buffer.
- OR if the dataset is larger: ((#rows in data set)/(#partitions)) > 10,000,000
  - SET GLOBAL columnstore_segment_rows = 10000000
  - 10,000,000 is the maximum value of columnstore_segment_rows

The buffer value will prevent inadvertent creation of a tiny segment containing the remainder of the data if data is not split perfectly evenly. A tiny segment can hurt throughput since it will have to be searched independently by a thread, and that could take noticeable time for index search startup. In addition, index search time is logarithmic, so more segments (and thus more indexes) are virtually always worse for throughput.