Configuring Full Text and Vector Indexes
On this page
Configuring SingleStore for high-performance vector and full-text search is important for achieving maximum performance from these searches.
This document is intended for readers with an understanding of SingleStore parallel query execution.
Configure for High Throughput / High QPS
For high-throughput workloads where the goal is to maximize Queries Per Second (QPS), the key configuration is to have fewer, bigger segments, and make the segment size even.
In addition, to get high queries per second (QPS), it helps to change the flexible parallelism settings to use only one thread per leaf node per query, which will use one thread per partition when there is one segment per partition.
Concept
Vector and full-text search performance is highly related to the number of index segments scanned.
Conceptually, to minimize the number of segments scanned by each query, the configuration should use one database partition on each leaf node, the minimum number of segments within that partition, and one thread on each leaf node scanning that partition.
To do so, the following configurations are needed:
- 
          Use large segments: Allow columnstore segments to be as large as possible (by setting internal_andcolumnstore_ max_ uncompressed_ blob_ size columnstore_to their maximum possible values).segment_ rows 
- 
          Use few partitions and eliminate subpartitions: Use one partition per leaf node, or two partitions if using a small, single-leaf or two-leaf system and eliminate subpartitions. 
- 
          Evenly distribute data among partitions: Choose a shard key that will evenly distribute data among partitions. 
- 
          Turn down flexible parallelism: Set flexible parallelism so that one thread will scan each partition. This setting reduces per-query overhead and helps performance when the majority of the query execution work is for index searches. 
- 
          Enable the full-text index cache. - 
              SET GLOBAL fts2_cache_ enable = ON 
- 
              Consider increasing the value of fts2_which can improve search performance at the expense of more memory consumption.cache_ blob_ memory_ mb 
- 
              Stop the FTS service using the FULLTEXT SERVICE STOPcommand after changing these settings to ensure these changes take effect.The FTS service will restart automatically when needed again. 
 
- 
              
Recommended Configuration
Note: It is important to create indexes and set the configuration variables before loading data.
To obtain the behavior described above and minimize the number of segments scanned, the recommended configuration settings are as follows:
- 
          Use few partitions and eliminate subpartitions: - 
              CREATE DATABASE <dbname> PARTITIONS <number of leaf nodes> SUB_PARTITIONS 0 
 
- 
              
- 
          Use large segments: - 
              SET GLOBAL internal_columnstore_ max_ uncompressed_ blob_ size = 10737418240 - 
                  10,737,418,240 is the maximum possible value of internal_columnstore_ max_ uncompressed_ blob_ size 
- 
                  Caution: It can take up to 3 times the amount of memory specified in this variable to create a blob. If you get OOM errors, reduce this value or scale up your RAM (say with a 2x or 4x cloud scaled instance). 
 
- 
                  
- 
              SET GLOBAL columnstore_segment_ rows = 10000000 - 
                  10,000,000 is the maximum value of columnstore_.segment_ rows 
 
- 
                  
 
- 
              
- 
          Evenly distribute data: - 
              Select a shard key that will evenly distribute data across partitions. 
 
- 
              
- 
          Turn down flexible parallelism: - 
              SET query_parallelism_ per_ leaf_ core = 0. 01 - 
                  With this configuration, the engine uses one thread per leaf node. 
- 
                  This is a session variable and will need to be set for each connection. 
 
- 
                  
 
- 
              
Verification and Optimization
- 
          Run the OPTIMIZE TABLE FLUSHcommand after the index is built to flush all index segments to disk.
- 
          After the initial configuration is set, verify, using the command below, that there are no or few small segments. If there are small segments, consider a change to the shard key or the number of partitions. SELECTDATABASE_NAME,TABLE_NAME,ORDINAL AS PARTITION_ID,ROWS,MEMORY_USEFROM INFORMATION_SCHEMA.TABLE_STATISTICSWHERE TABLE_NAME = '<table_name>';
Data Distribution Notes
When tuning for QPS, do not increase the partition count if doing so shrinks the segment size.
However, when the data is very large, that is the number of rows is much greater than (number of partitions) * (segment size), there will be many segments in each partition.OPTIMIZE TABLE FULL command, the segments will have roughly even sizes except for the tail segment.
Avoid compaction during data ingestion by running the OPTIMIZE TABLE FULL command after data has been loaded into the table.
Thus, if the data size is large enough so that there is more than one maximum-sized segment per partition, you can increase the partition count without effect on QPS.
Workload
The configuration above is designed for workloads in which:
- 
          Multiple queries are run in parallel. - 
              When flexible parallelism is turned down so that there is one thread per leaf node, a concern is that there are extra VCPUs which will not be used. When multiple queries are running in parallel, these extra VCPUs are used. 
 
- 
              
- 
          Queries do not have highly selective predicates. - 
              When queries do not have highly selective predicates, segment-elimination is not effective, therefore a 'small' number of 'large' segments is a reasonable configuration. 
 
- 
              
Additional Configurations and Workload Considerations
Segment Size Tradeoffs
The index configuration (segment size) needs to be matched to the workload.
In SingleStore, vector and full-text indexes are built per (columnstore) segment.
While large segment sizes are good for QPS and for search performance, smaller segments may be appropriate for workloads with updates or workloads with highly selective queries.WHERE filters may benefit from smaller segments because segment elimination is applied before vector or full-text index searches.
When indexes are updated regularly, when latency is important and high parallelism is useful, or when there are highly selective queries, appropriate segment size needs to be considered.
Segment Size Tradeoffs for Update Performance
There is a tradeoff between full-text and vector index search performance and index update rate.
To increase the index update rate, consider reducing the segment size or increasing the number of partitions.
Segment Size Tradeoffs for Highly Selective Queries
Similarly, there is a tradeoff between full-text and vector index search performance and segment elimination.
Segment elimination is the process whereby the system avoids scanning segments that do not meet the filter in the query.WHERE clause of a query.
That is, queries with highly selective predicates may benefit from an increased number of partitions and segments.
Refer to Choosing a Columnstore Key for information on segment elimination.
Adjust the Number of Segments and Partitions
Segment size in SingleStore is managed through the variables:
- 
            internal_(max value 10,737,418,240)columnstore_ max_ uncompressed_ blob_ size 
- 
            columnstore_(max value 10,000,000)segment_ rows 
As noted above, it's often best to set these to larger-than-default values (e.WHERE clause filters.CREATE DATABASE statement as recommended earlier.
Parallel Scan and Flexible Parallelism
SingleStore is a powerful parallel database which supports a spectrum of parallelism; different levels of parallelization are effective for different types of workloads.
Flexible Parallelism
Flexible parallelism allows multiple threads to scan a single partition, increasing the parallelism (number of cores) used for a query.
If you have multiple queries running concurrently, parallelism may not help.query_.
Turning down parallelism by decreasing query_ helps throughput by reducing overhead for starting and synchronizing threads. ORDER BY . queries on more threads which could increase the total number of rows decoded and transmitted.
Flexible Parallelism Configuration
When doing a parallel (non-indexed) scan:
- 
            If (flexible) parallelism is set to one ( query_), then there will be one thread per CPU (vCPU) core and each thread will scan different sub-partitions in parallel.parallelism_ per_ leaf_ core = 1 
- 
            If (flexible) parallelism is set lower ( query_), then fewer threads will be used and each thread will scan multiple sub-partitions.parallelism_ per_ leaf_ core < 1 Setting query_will give one thread per leaf node, which is ideal for high-throughput search workloads.parallelism_ per_ leaf_ core = . 01 
Configure for Large Data
For huge data, you may want more than one thread per leaf node to balance response time versus throughput.query_.
Configure for Response Time
To obtain the best response times for individual queries, use multiple threads per query.query_ at its default value of 1. or some relatively high value such as 0. or 0..
Benchmarking with Fixed Size Data
If your data is fixed size, for example, if you are benchmarking, follow this procedure to get data evenly distributed among partitions.
- 
          Calculate an even distribution of data among partitions and set columnstore_to enforce that distribution.segment_ rows - 
              SET GLOBAL columnstore_segment_ rows = ((#rows in data set)/(#partitions)) + buffer - 
                  Suggest value of ~1000 for buffer. 
 
- 
                  
- 
              OR if the dataset is larger: ((#rows in data set)/(#partitions)) > 10,000,000 - 
                  SET GLOBAL columnstore_segment_ rows = 10000000 
- 
                  10,000,000 is the maximum value of columnstore_segment_ rows 
 
- 
                  
 
- 
              
The buffer value will prevent inadvertent creation of a tiny segment containing the remainder of the data if data is not split perfectly evenly.
Full-Text Considerations and Optimizations
Below are a series of considerations and optimizations to consider for full-text search performance.
Indexing Limited Full-Text Columns
For full-text indexes, the number of columns that are indexed can affect search performance.
Full-Text ORDER BY … LIMIT Optimization
There is a significant optimization that pushes an ORDER BY…LIMIT clause down into the Java Lucene process if a query is structured in a certain way.
For this optimization to work, the query must follow these rules:
- 
          The WHEREclause must either be empty or the expression in theWHEREclause must be the same full-text expression that is in theORDER BYclause.
- 
          The ORDER BYclause must useDESCsort order.
- 
          There must be only one full-text search function ( MATCH,BM25, orBM25_) in the query.GLOBAL 
You can often "refactor" the FTS part out of a query to meet these rules and employ the optimization as shown in the following example.
Original query:
SELECTucid, company_name, fc.company_description,MATCH (TABLE fc)AGAINST ('company_technologies:("font awesome" OR "open graph")') AS _matchFROM Company cJOIN FTS_Company_new fc USING (ucid)JOIN Person p USING(ucid)ANTI_JOIN GlobalExclude_global ge ON ge.PersonId = p.upidWHERE _matchGROUP BY ucidORDER BY _match DESCLIMIT 25 OFFSET 0;
Rewritten query that benefits from the optimization:
WITH _fts AS (SELECTucid, company_description,MATCH (table fc)AGAINST ('company_technologies:("font awesome" OR "open graph")')AS _matchFROM FTS_Company_new fcWHERE _match ORDER BY _match desc limit 25)SELECT ucid, company_name, fc.company_description, fc._matchFROM Company cJOIN _fts fc USING (ucid)JOIN Person p USING(ucid)ANTI_JOIN GlobalExclude_global ge ON ge.PersonId = p.upidWHERE fc._matchGROUP BY ucidORDER BY fc._match DESCLIMIT 25 OFFSET 0;
Factoring the full-text search away from the joins into a CTE allows the optimization to work.
FTS V2 Architecture and Performance
In FTS v2, the Lucene engine is decoupled from the main database engine and runs in a co-process.
In queries with filters on multiple attributes, consider adding attributes that are used in filters to the full-text index so that the full WHERE clause can be moved into the Lucene expression.
Throttling Full-Text Search
If full-text search is taking too much CPU time, full-text search can be throttled by reducing the value of the fts2_ engine variable.fts2_ is set to 6, that means that approximately 6 CPUs on each leaf will be used.
Warm the Cache for Full-Text Search
The disk cache can be warmed for a full-text search by executing a query such as the following which will cause the full-text indexes to be downloaded into the local disk cache.
SELECT COUNT(*)FROM table_nameWHERE MATCH(TABLE table_name) AGAINST ('valid_column:random_term');
Other Strategies
To get the best performance, get creative with application and even data changes.
Resources
Last modified: July 15, 2025