Configuring Full Text and Vector Indexes
On this page
Configuring SingleStore for high-performance vector and full-text search is important for achieving maximum performance from these searches.
This document is intended for readers with an understanding of SingleStore parallel query execution.
Configure for High Throughput / High QPS
For high-throughput workloads where the goal is to maximize Queries Per Second (QPS), the key configuration is to have fewer, bigger segments, and make the segment size even.
Concept
Vector and full-text search performance is highly related to the number of index segments scanned.
Conceptually, to minimize the number of segments scanned by each query, the configuration should use one database partition on each leaf node, the minimum number of segments within that partition, and one thread on each leaf node scanning that partition.
To do so, the following configurations are needed:
-
Use large segments: Allow columnstore segments to be as large as possible (by setting
internal_
andcolumnstore_ max_ uncompressed_ blob_ size columnstore_
to their maximum possible values).segment_ rows -
Use few partitions and eliminate subpartitions: Use one partition per leaf node, or two partitions if using a small, single-leaf or two-leaf system and eliminate subpartitions.
-
Evenly distribute data among partitions: Choose a shard key that will evenly distribute data among partitions.
-
Turn down flexible parallelism: Set flexible parallelism so that one thread will scan each partition.
This setting reduces per-query overhead and helps performance when the majority of the query execution work is for index searches.
Recommended Configuration
Note: It is important to create indexes and set the configuration variables before loading data.
To obtain the behavior described above and minimize the number of segments scanned, the recommended configuration settings are as follows:
-
Use few partitions and eliminate subpartitions:
-
CREATE DATABASE <dbname> SET PARTITIONS <number of leaf nodes> SUB_
PARTITIONS 0
-
-
Use large segments:
-
SET GLOBAL internal_
columnstore_ max_ uncompressed_ blob_ size = 10737418240 -
10,737,418,240 is the maximum possible value of
internal_
columnstore_ max_ uncompressed_ blob_ size -
Caution: It can take up to 3 times the amount of memory specified in this variable to create a blob.
If you get OOM errors, reduce this value or scale up your RAM (say with a 2x or 4x cloud scaled instance).
-
-
SET GLOBAL columnstore_
segment_ rows = 10000000 -
10,000,000 is the maximum value of
columnstore_
.segment_ rows
-
-
-
Evenly distribute data:
-
Select a shard key that will evenly distribute data across partitions.
-
-
Turn down flexible parallelism:
-
SET query_
parallelism_ per_ leaf_ core = 0. 01 -
With this configuration, the engine uses one thread per partition.
-
This is a session variable and will need to be set for each connection.
-
-
Data Distribution Notes
When tuning for QPS, do not increase the partition count if doing so shrinks the segment size.
However, when the data is very large, that is the number of rows is much greater than (number of partitions) * (segment size), there will be many segments in each partition.OPTIMIZE TABLE FULL
command, the segments will have roughly even sizes except for the tail segment.
Thus, if the data size is large enough so that there is more than one maximum-sized segment per partition, you can increase the partition count without effect on QPS.
Verification and Optimization
-
Run the
OPTIMIZE TABLE FULL
command after the index is built to flush all index segments to disk. -
After the initial configuration is set, verify, using the command below, that there are no or few small segments.
If there are small segments, consider a change to the shard key or the number of partitions. SELECTDATABASE_NAME,TABLE_NAME,ORDINAL AS PARTITION_ID,ROWS,MEMORY_USEFROM INFORMATION_SCHEMA.TABLE_STATISTICSWHERE TABLE_NAME = '<table_name>'; -
Avoid compaction during data ingestion; run the
OPTIMIZE TABLE FULL
command after data has been loaded into the table.
Workload
The configuration above is designed for workloads in which:
-
Multiple queries are run in parallel.
-
When flexible parallelism is turned down so that there is one thread per partition (per leaf), a concern is that there are extra VCPUs which will not be used.
When multiple queries are running in parallel, these extra VCPUs are used.
-
-
Queries do not have highly selective predicates.
-
When queries do not have highly selective predicates, segment-elimination is not effective, therefore a 'small' number of 'large' segments is a reasonable configuration.
-
Additional Configurations and Workload Considerations
Segment Size Tradeoffs
The index configuration (segment size) needs to be matched to the workload.
In SingleStore, vector and full-text indexes are built per (columnstore) segment.
While large segment sizes are good for QPS and for search performance, smaller segments may be appropriate for workloads with updates or workloads with highly selective queries.WHERE
filters may benefit from smaller segments because segment elimination is applied before vector or full-text index searches.
When indexes are updated regularly, when latency is important and high parallelism is useful, or when there are highly selective queries, appropriate segment size needs to be considered.
Segment Size Tradeoffs for Update Performance
There is a tradeoff between full-text and vector index search performance and index update rate.
To increase the index update rate, consider reducing the segment size or increasing the number of partitions.
Segment Size Tradeoffs for Highly Selective Queries
Similarly, there is a tradeoff between full-text and vector index search performance and segment elimination.
Segment elimination is the process whereby the system avoids scanning segments that do not meet the filter in the query.WHERE
clause of a query.
That is, queries with highly selective predicates may benefit from an increased number of partitions and segments.
Refer to Choosing a Columnstore Key for information on segment elimination.
Adjust the Number of Segments and Partitions
Segment size in SingleStore is managed through the variables:
-
internal_
(max value 10,737,418,240)columnstore_ max_ uncompressed_ blob_ size -
columnstore_
(max value 10,000,000)segment_ rows
As noted above, it's often best to set these to larger-than-default values (e.WHERE
clause filters.CREATE DATABASE
statement as recommended earlier.
Parallel Scan and Flexible Parallelism
SingleStore is a powerful parallel database which supports a spectrum of parallelism; different levels of parallelization are effective for different types of workloads.
Flexible Parallelism
Flexible parallelism allows multiple threads to scan a single partition, increasing the parallelism (number of cores) used for a query.
If you have multiple queries running concurrently, parallelism may not help.query_
.
Turning down parallelism by decreasing query_
helps throughput by reducing overhead for starting and synchronizing threads. ORDER BY .
queries on more threads which could increase the total number of rows decoded and transmitted.
Flexible Parallelism Configuration
When doing a parallel (non-indexed) scan:
-
If (flexible) parallelism is set to one (
query_
), then there will be one thread per CPU (vCPU) core and each thread will scan different sub-partitions in parallel.parallelism_ per_ leaf_ core = 1 -
If (flexible) parallelism is set lower (
query_
), then fewer threads will be used and each thread will scan multiple sub-partitions.parallelism_ per_ leaf_ core < 1 Setting query_
will give one thread per partition, which is ideal for high-throughput search workloads.parallelism_ per_ leaf_ core = . 01
Configure for Large Data
For huge data, you may want more than one thread per partition to balance response time versus throughput.query_
.
Configure for Response Time
To obtain the best response times for individual queries, use multiple threads per query.query_
at its default value of 1.
or some relatively high value such as 0.
or 0.
.
Benchmarking with Fixed Size Data
If your data is fixed size, for example, if you are benchmarking, follow this procedure to get data evenly distributed among partitions.
-
Calculate an even distribution of data among partitions and set
columnstore_
to enforce that distribution.segment_ rows -
SET GLOBAL columnstore_
segment_ rows = ((#rows in data set)/(#partitions)) + buffer -
Suggest value of ~1000 for buffer.
-
-
OR if the dataset is larger: ((#rows in data set)/(#partitions)) > 10,000,000
-
SET GLOBAL columnstore_
segment_ rows = 10000000 -
10,000,000 is the maximum value of
columnstore_
segment_ rows
-
-
The buffer value will prevent inadvertent creation of a tiny segment containing the remainder of the data if data is not split perfectly evenly.
Full-Text Considerations and Optimizations
Below are a series of considerations and optimizations to consider for full-text search performance.
Indexing Limited Full-Text Columns
For full-text indexes, the number of columns that are indexed can affect search performance.
Full-Text ORDER BY … LIMIT Optimization
There is a significant optimization that pushes an ORDER BY…LIMIT
clause down into the Java Lucene process if a query is structured in a certain way.
For this optimization to work, the query must follow these rules:
-
The
WHERE
clause must either be empty or the expression in theWHERE
clause must be the same full-text expression that is in theORDER BY
clause. -
The
ORDER BY
clause must useDESC
sort order. -
There must be only one full-text search function (
MATCH
,BM25
, orBM25_
) in the query.GLOBAL
You can often "refactor" the FTS part out of a query to meet these rules and employ the optimization as shown in the following example.
Original query:
SELECTucid, company_name, fc.company_description,MATCH (TABLE fc)AGAINST ('company_technologies:("font awesome" OR "open graph")') AS _matchFROM Company cJOIN FTS_Company_new fc USING (ucid)JOIN Person p USING(ucid)ANTI_JOIN GlobalExclude_global ge ON ge.PersonId = p.upidWHERE _matchGROUP BY ucidORDER BY _match DESCLIMIT 25 OFFSET 0;
Rewritten query that benefits from the optimization:
WITH _fts AS (SELECTucid, company_description,MATCH (table fc)AGAINST ('company_technologies:("font awesome" OR "open graph")')AS _matchFROM FTS_Company_new fcWHERE _match ORDER BY _match desc limit 25)SELECT ucid, company_name, fc.company_description, fc._matchFROM Company cJOIN _fts fc USING (ucid)JOIN Person p USING(ucid)ANTI_JOIN GlobalExclude_global ge ON ge.PersonId = p.upidWHERE fc._matchGROUP BY ucidORDER BY fc._match DESCLIMIT 25 OFFSET 0;
Factoring the full-text search away from the joins into a CTE allows the optimization to work.
FTS V2 Architecture and Performance
In FTS v2, the Lucene engine is decoupled from the main database engine and runs in a co-process.
In queries with filters on multiple attributes, consider adding attributes that are used in filters to the full-text index so that the full WHERE
clause can be moved into the Lucene expression.
Throttling Full-Text Search
If full-text search is taking too much CPU time, full-text search can be throttled by reducing the value of the fts2_
engine variable.fts2_
is set to 6
, that means that approximately 6 CPUs on each leaf will be used.
Other Strategies
To get the best performance, get creative with application and even data changes.
Resources
Last modified: January 28, 2025