Warning
SingleStore 9.0 gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 8.9 is recommended for production workloads, which can later be upgraded to SingleStore 9.0.
Spilling-Avoidance Rewrite
On this page
Spilling-avoidance rewrite is a new query optimization feature designed to help prevent memory overflow (spilling) during hash joins on large datasets.
This feature is especially useful when joining large tables on high-cardinality columns, where traditional hash joins are likely to exceed memory limits.
The spilling-avoidance rewrite is built into the query optimizer.UNION ALL
.
Note
Spilling-avoidance rewrite introduces overhead, which can negatively impact performance.
Configure Spilling-Avoidance Rewrite
The spilling-avoidance rewrite splits large hash joins into smaller joins to avoid memory overflow.
The spilling-avoidance rewrite is configured by specifying:
-
The cardinality threshold at which the rewrite will be triggered.
-
The size of the smaller joins (groups) into which the larger hash join should be split.
The two engine variables used to configure these two values are:
|
Specifies the cardinality threshold for spilling-avoidance rewrite. A value of |
|
Specifies the maximum size of the input (in rows) to the smaller join queries, also called the group size. |
To enable the spilling-avoidance rewrite, set the cardinality_
engine variable to a value other than UINT_
.
To disable the spilling-avoidance rewrite, set the cardinality_
engine variable to UINT_
.
Select the Engine Variable Values
cardinality_ threshold_ for_ join_ split_ rewrite
-
When both join columns have cardinality greater than
cardinality_
, the hash join will be rewritten into smaller joins.threshold_ for_ join_ split_ rewrite -
Choose the value of this variable based on the cardinality threshold beyond which the original query is expected to encounter OOM errors.
-
Proper selection ensures that the rewrite is only triggered when truly needed, preventing unnecessary performance degradation.
Use the following command to set this variable's value.
SET SESSION cardinality_threshold_for_join_split_rewrite = <value>
target_ group_ size_ for_ join_ split_ rewrite
-
The goal for the value of this variable is to strike a balance between minimizing the number of splits and ensuring that each split can be efficiently processed without memory issues.
-
SingleStore recommends using the following formula to choose a value for
target_
.group_ size_ for_ join_ split_ rewrite -
target_
= cardinality of the joined column with the lowest cardinality /group_ size_ for_ join_ split_ rewrite N
-
N
is the number of splits
-
-
Choose
N
to be as small as possible (e.g. , 2) while ensuring that the target_
remains manageable within the available memory.group_ size_ for_ join_ split_ rewrite -
The
target_
must be set to a value below 10,000,000 to avoid unnecessary overhead.group_ size_ for_ join_ split_ rewrite
Use the following command to set this variable's value.
SET SESSION target_group_size_for_join_split_rewrite = <value>
Remarks
The following apply to the spilling-avoidance rewrite:
-
The tables must be columnstore tables.
-
The tables must have column statistics available.
-
A histogram on the high cardinality column must exist.
-
The join condition on the high cardinality column must be an equality condition on two columns, for example:
t1.
.a = t2. b -
The join condition on the high cardinality column cannot include an inequality.
For example, if the join condition is t1.
, the join will not be rewritten.a < t2. b -
The join condition on the high cardinality column cannot include a constant value or expressions.
For example, if the join condition is t1.
, the join will not be rewritten.a = 1 -
The joined columns must be of numeric types (integer or double).
Example
Consider two tables, t1
and t2
, both with an ID
column of high cardinality.
Assume the cardinalities of the ID columns are as follows:
-
t1.
has distinct values (1-20M) thus has cardinality 20,000,000ID -
t2.
has distinct values (1-40M) thus has cardinality 40,000,000ID
The following query joins tables t1
and t2
on the high-cardinality ID
columns.
SELECT *FROM t2JOIN t1on t2.ID = t1.ID;
To enable the spilling-avoidance rewrite for this query, use the following commands.
SET SESSION cardinality_threshold_for_join_split_rewrite = 12000000;SET SESSION target_group_size_for_join_split_rewrite = 10000000;
Setting cardinality_
indicates that if a join column has a cardinality greater than 12,000,000
, the original query should be rewritten into smaller joins.t1.
and t2.
have a cardinality greater than 12,000,000
, so the engine will apply the spilling-avoidance rewrite to this query.12,000,000
, so the threshold is met.
Setting target_
enables the spilling-avoidance rewrite and specifies the size of the inputs to the smaller joins (also called the groups).
With these settings, the previous query will be rewritten as follows:
(SELECT *FROM t2JOIN t1ON t2.ID = t1.ID AND t1.ID<10000000)UNION ALL(SELECT *FROM t2JOIN t1ON t2.ID = t1.ID AND t1.ID >= 10000001);
The single join is rewritten into two smaller joins which are combined with a UNION ALL
.
The smaller joins are created by adding a filter on the t1.
column.t1.
column has 20,000,000 distinct values (a cardinality of 20,000,000).target_
using the filters t1.
and t1.
.
The column with the smaller cardinality, t1.
in this example, is used to create the groups.
Last modified: June 13, 2025