Warning

SingleStore 9.0 gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 8.9 is recommended for production workloads, which can later be upgraded to SingleStore 9.0.

Spilling-Avoidance Rewrite

Spilling-avoidance rewrite is a new query optimization feature designed to help prevent memory overflow (spilling) during hash joins on large datasets. When enabled, it automatically rewrites qualifying joins into smaller, manageable pieces—reducing memory pressure and the risk of out-of-memory (OOM) errors.

This feature is especially useful when joining large tables on high-cardinality columns, where traditional hash joins are likely to exceed memory limits.

The spilling-avoidance rewrite is built into the query optimizer. Queries with a hash join on a column with a cardinality exceeding the specified threshold are split into several smaller join queries. The results of the smaller join queries are recombined using UNION ALL.

Note

Spilling-avoidance rewrite introduces overhead, which can negatively impact performance. SingleStore recommends using spilling-avoidance rewrite only when necessary, for example, to prevent out-of-memory (OOM) errors.

Configure Spilling-Avoidance Rewrite

The spilling-avoidance rewrite splits large hash joins into smaller joins to avoid memory overflow.

The spilling-avoidance rewrite is configured by specifying:

The cardinality threshold at which the rewrite will be triggered.
The size of the smaller joins (groups) into which the larger hash join should be split.

The two engine variables used to configure these two values are:

cardinality_threshold_for_join_split_rewrite

Specifies the cardinality threshold for spilling-avoidance rewrite. If all relevant join columns have a cardinality above this threshold, the JOIN query will be split into smaller join queries.

A value of UINT_MAX (the default) indicates that the rewrite is disabled.

target_group_size_for_join_split_rewrite

Specifies the maximum size of the input (in rows) to the smaller join queries, also called the group size.

To enable the spilling-avoidance rewrite, set the cardinality_threshold_for_join_split engine variable to a value other than UINT_MAX.

To disable the spilling-avoidance rewrite, set the cardinality_threshold_for_join_split engine variable to UINT_MAX.

Select the Engine Variable Values

`cardinality_threshold_for_join_split_rewrite`

When both join columns have cardinality greater than cardinality_threshold_for_join_split_rewrite, the hash join will be rewritten into smaller joins.
Choose the value of this variable based on the cardinality threshold beyond which the original query is expected to encounter OOM errors.
Proper selection ensures that the rewrite is only triggered when truly needed, preventing unnecessary performance degradation.

Use the following command to set this variable's value.

SQL

SET SESSION cardinality_threshold_for_join_split_rewrite = <value>

`target_group_size_for_join_split_rewrite`

The goal for the value of this variable is to strike a balance between minimizing the number of splits and ensuring that each split can be efficiently processed without memory issues.
SingleStore recommends using the following formula to choose a value for target_group_size_for_join_split_rewrite.
- target_group_size_for_join_split_rewrite = cardinality of the joined column with the lowest cardinality / N
- N is the number of splits
Choose N to be as small as possible (e.g., 2) while ensuring that the target_group_size_for_join_split_rewrite remains manageable within the available memory.
The target_group_size_for_join_split_rewrite must be set to a value below 10,000,000 to avoid unnecessary overhead.

Use the following command to set this variable's value.

SQL

SET SESSION target_group_size_for_join_split_rewrite = <value>

Remarks

The following apply to the spilling-avoidance rewrite:

The tables must be columnstore tables.
The tables must have column statistics available.
A histogram on the high cardinality column must exist.
The join condition on the high cardinality column must be an equality condition on two columns, for example: t1.a = t2.b.
The join condition on the high cardinality column cannot include an inequality. For example, if the join condition is t1.a < t2.b , the join will not be rewritten.
The join condition on the high cardinality column cannot include a constant value or expressions. For example, if the join condition is t1.a = 1, the join will not be rewritten.
The joined columns must be of numeric types (integer or double).

Example

Consider two tables, t1 and t2, both with an ID column of high cardinality.

Assume the cardinalities of the ID columns are as follows:

t1.ID has distinct values (1-20M) thus has cardinality 20,000,000
t2.ID has distinct values (1-40M) thus has cardinality 40,000,000

The following query joins tables t1 and t2 on the high-cardinality ID columns. This join is a candidate for the spilling-avoidance rewrite.

SQL

SELECT * 
FROM t2
JOIN t1
on t2.ID = t1.ID;

To enable the spilling-avoidance rewrite for this query, use the following commands.

SQL

SET SESSION cardinality_threshold_for_join_split_rewrite = 12000000;
SET SESSION target_group_size_for_join_split_rewrite = 10000000;

Setting cardinality_threshold_for_join_split_rewrite = 12000000 indicates that if a join column has a cardinality greater than 12,000,000, the original query should be rewritten into smaller joins. Both t1.ID and t2.ID have a cardinality greater than 12,000,000, so the engine will apply the spilling-avoidance rewrite to this query. Note that both columns have cardinality greater than 12,000,000, so the threshold is met.

Setting target_group_size_for_join_split_rewrite = 10000000 enables the spilling-avoidance rewrite and specifies the size of the inputs to the smaller joins (also called the groups).

With these settings, the previous query will be rewritten as follows:

SQL

(SELECT * 
 FROM t2 
 JOIN t1 
 ON t2.ID = t1.ID AND t1.ID<10000000
)

UNION ALL

(SELECT * 
 FROM t2 
 JOIN t1 
 ON t2.ID  = t1.ID AND t1.ID >= 10000001
);

The single join is rewritten into two smaller joins which are combined with a UNION ALL.

The smaller joins are created by adding a filter on the t1.ID column. The t1.ID column has 20,000,000 distinct values (a cardinality of 20,000,000). This column is used to break the query into two groups of size target_group_size_for_join_split_rewrite using the filters t1.ID < 10000000 and t1.ID >= 10000001.

The column with the smaller cardinality, t1.ID in this example, is used to create the groups.