Important
The SingleStore 9.1 release candidate (RC) gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 9.0 is recommended for production workloads, which can later be upgraded to SingleStore 9.1.
Configuring Vector Indexes
On this page
This page describes advanced configuration for high-performance vector search in SingleStore.
For most use cases, creating a vector index without additional configuration provides strong performance.
This document assumes an understanding of SingleStore parallel query execution.
Configure for High Throughput
For high-throughput workloads where the goal is to maximize Queries Per Second (QPS), it is important to have fewer, bigger segments, and make the segment size even.
In addition, to get high QPS, it helps to change the flexible parallelism settings to use only one thread per leaf node per query, which will use one thread per partition when there is one segment per partition.
Concept
Vector search performance is highly related to the number of index segments scanned.
The vector index merger runs automatically in the background to combine per-segment vector indexes and create cross-segment vector indexes.
In most cases, the vector index merger is sufficient; however, the maximum combined index size created by the vector index merger is limited by the following factors:
-
Combined indexes created by the vector index merger contain data from at most four segments.
-
Segment size is limited by
internal_andcolumnstore_ max_ uncompressed_ blob_ size columnstore_.segment_ rows
If the vector index merger does not produce sufficiently large indexes, to minimize the number of segments scanned by each query, the configuration should use one database partition on each leaf node, the minimum number of segments within that partition, and one thread on each leaf node scanning that partition.
Configurations for High-Performance Search
Note: It is important to create indexes and set the configuration variables before loading data.
To maximize QPS for vector search, use the following configurations in order.
-
Evenly distribute data:
-
Select a shard key that will evenly distribute data across partitions.
-
-
Use few partitions and eliminate subpartitions:
-
CREATE DATABASE <dbname> PARTITIONS <number of leaf nodes> SUB_PARTITIONS 0
-
-
Turn down flexible parallelism:
-
SET query_parallelism_ per_ leaf_ core = 0. 01 -
With this configuration, the engine uses one thread per leaf node.
-
This is a session variable and will need to be set for each connection.
-
-
-
Use larger segments:
-
Increase the value of
internal_.columnstore_ max_ uncompressed_ blob_ size -
The default value is 512 MB (536870912 bytes) and the largest value it can be is 10GB (10737418240 bytes).
-
Caution: It can take up to 3 times the amount of memory specified in this variable to create a blob.
If you get OOM errors, reduce this value or scale up your RAM (say with a 2x or 4x cloud scaled instance).
-
-
Increase the value of
columnstore_.segment_ rows -
The default number of rows in a columnstore segment is 1,024,000; 10,000,000 is the maximum value.
-
-
Note
Run the OPTIMIZE TABLE FLUSH command after the index is built to flush all index segments to disk.
Verification and Optimization
Use the command below to verify that there are no or few small segments.
SELECTDATABASE_NAME,TABLE_NAME,ORDINAL AS PARTITION_ID,ROWS,MEMORY_USEFROM INFORMATION_SCHEMA.TABLE_STATISTICSWHERE TABLE_NAME = '<table_name>';
Data Distribution Notes
When tuning for QPS, do not increase the partition count if doing so shrinks the segment size.
However, when the data is very large, that is the number of rows is much greater than (number of partitions) * (segment size), there will be many segments in each partition.OPTIMIZE TABLE FULL command, the segments will have roughly even sizes except for the tail segment.
Avoid compaction during data ingestion by running the OPTIMIZE TABLE FULL command after data has been loaded into the table.
Thus, if the data size is large enough so that there is more than one maximum-sized segment per partition, you can increase the partition count without effect on QPS.
Workload
The configuration above is designed for workloads in which:
-
Multiple queries are run in parallel.
-
When flexible parallelism is turned down so that there is one thread per leaf node, a concern is that there are extra VCPUs which will not be used.
When multiple queries are running in parallel, these extra VCPUs are used.
-
-
Queries do not have highly selective predicates.
-
When queries do not have highly selective predicates, segment-elimination is not effective, therefore a 'small' number of 'large' segments is a reasonable configuration.
-
Workload Considerations
Segment Size Tradeoffs
The index configuration (segment size) needs to be matched to the workload.
In SingleStore, vector indexes are built per (columnstore) segment.
While large segment sizes are good for QPS and for search performance, smaller segments may be appropriate for workloads with updates or workloads with highly selective queries.WHERE filters may benefit from smaller segments because segment elimination is applied before vector index searches.
When indexes are updated regularly, when latency is important and high parallelism is useful, or when there are highly selective queries, appropriate segment size needs to be considered.
Segment Size Tradeoffs for Update Performance
There is a tradeoff between vector index search performance and index update rate.
To increase the index update rate, consider reducing the segment size or increasing the number of partitions.
Segment Size Tradeoffs for Highly Selective Queries
Similarly, there is a tradeoff between vector index search performance and segment elimination.
Segment elimination is the process whereby the system avoids scanning segments that do not meet the filter in the query.WHERE clause of a query.
That is, queries with highly selective predicates may benefit from an increased number of partitions and segments.
Refer to Choosing a Columnstore Key for information on segment elimination.
Adjust the Number of Segments and Partitions
Segment size in SingleStore is managed through the variables:
-
internal_(max value 10,737,418,240)columnstore_ max_ uncompressed_ blob_ size -
columnstore_(max value 10,000,000)segment_ rows
As noted above, it's often best to set these to larger-than-default values (e.WHERE clause filters.CREATE DATABASE statement as recommended earlier.
Parallel Scan and Flexible Parallelism
SingleStore is a powerful parallel database which supports a spectrum of parallelism; different levels of parallelization are effective for different types of workloads.
Flexible Parallelism
Flexible parallelism allows multiple threads to scan a single partition, increasing the parallelism (number of cores) used for a query.
If you have multiple queries running concurrently, parallelism may not help.query_.
Turning down parallelism by decreasing query_ helps throughput by reducing overhead for starting and synchronizing threads. ORDER BY . queries on more threads which could increase the total number of rows decoded and transmitted.
Flexible Parallelism Configuration
When doing a parallel (non-indexed) scan:
-
If (flexible) parallelism is set to one (
query_), then there will be one thread per CPU (vCPU) core and each thread will scan different sub-partitions in parallel.parallelism_ per_ leaf_ core = 1 -
If (flexible) parallelism is set lower (
query_), then fewer threads will be used and each thread will scan multiple sub-partitions.parallelism_ per_ leaf_ core < 1 Setting query_will give one thread per leaf node, which is ideal for high-throughput search workloads.parallelism_ per_ leaf_ core = . 01
Configure for Large Data
For huge data, you may want more than one thread per leaf node to balance response time versus throughput.query_.
Configure for Response Time
To obtain the best response times for individual queries, use multiple threads per query.query_ at its default value of 1. or some relatively high value such as 0. or 0..
Benchmarking with Fixed Size Data
If your data is fixed size, for example, if you are benchmarking, follow this procedure to get data evenly distributed among partitions.
-
Calculate an even distribution of data among partitions and set
columnstore_to enforce that distribution.segment_ rows -
SET GLOBAL columnstore_segment_ rows = ((#rows in data set)/(#partitions)) + buffer -
Suggest value of ~1000 for buffer.
-
-
OR if the dataset is larger: ((#rows in data set)/(#partitions)) > 10,000,000
-
SET GLOBAL columnstore_segment_ rows = 10000000 -
10,000,000 is the maximum value of
columnstore_segment_ rows
-
-
The buffer value will prevent inadvertent creation of a tiny segment containing the remainder of the data if data is not split perfectly evenly.
Resources
Last modified: