Join Memory Reduction Optimization
On this page
Important
This feature is an opt-in preview.
Opt-in previews allow you to evaluate and provide feedback on new and upcoming features prior to their general availability.
To access this opt-in preview, create a SingleStore Helios deployment and select "Opt-in to Preview Features and Updates.
The join memory reduction optimization is a new query optimization feature designed to help prevent memory overflow (spilling) during hash joins on large datasets.
This feature is especially useful when joining large tables on high-cardinality columns, where traditional hash joins are likely to exceed memory limits.
The join memory reduction optimization is built into the query optimizer.UNION ALL
.
Note
The join memory reduction optimization introduces overhead, which can negatively impact performance.
Configure Join Memory Reduction Optimization
The join memory reduction optimization splits large hash joins into smaller joins to avoid memory overflow.
This optimization is configured by specifying:
-
The cardinality threshold at which the optimization will be triggered.
-
The size of the smaller joins (groups) into which the larger hash join should be split.
The two engine variables used to configure these two values are:
|
Specifies the cardinality threshold for the join memory reduction optimization. A value of |
|
Specifies the maximum size of the input (in rows) to the smaller join queries, also called the group size. |
To enable the join memory reduction optimization, set the cardinality_
engine variable to a value other than UINT_
.
To disable the join memory reduction optimization, set the cardinality_
engine variable to UINT_
.
Select the Engine Variable Values
cardinality_ threshold_ for_ join_ split_ rewrite
-
When both join columns have cardinality greater than
cardinality_
, the hash join will be rewritten into smaller joins.threshold_ for_ join_ split_ rewrite -
Choose the value of this variable based on the cardinality threshold beyond which the original query is expected to encounter OOM errors.
-
Proper selection ensures that the optimization is only triggered when truly needed, preventing unnecessary performance degradation.
Use the following command to set this variable's value.
SET SESSION cardinality_threshold_for_join_split_rewrite = <value>
target_ group_ size_ for_ join_ split_ rewrite
-
The goal for the value of this variable is to strike a balance between minimizing the number of splits and ensuring that each split can be efficiently processed without memory issues.
-
SingleStore recommends using the following formula to choose a value for
target_
.group_ size_ for_ join_ split_ rewrite -
target_
= cardinality of the joined column with the lowest cardinality /group_ size_ for_ join_ split_ rewrite N
-
N
is the number of splits
-
-
Choose
N
to be as small as possible (e.g. , 2) while ensuring that the target_
remains manageable within the available memory.group_ size_ for_ join_ split_ rewrite -
The
target_
must be set to a value below 10,000,000 to avoid unnecessary overhead.group_ size_ for_ join_ split_ rewrite
Use the following command to set this variable's value.
SET SESSION target_group_size_for_join_split_rewrite = <value>
Remarks
The following apply to the join memory reduction optimization:
-
The tables must be columnstore tables.
-
The tables must have column statistics available.
-
A histogram on the high cardinality column must exist.
-
The join condition on the high cardinality column must be an equality condition on two columns, for example:
t1.
.a = t2. b -
The join condition on the high cardinality column cannot include an inequality.
For example, if the join condition is t1.
, the join will not be rewritten.a < t2. b -
The join condition on the high cardinality column cannot include a constant value or expressions.
For example, if the join condition is t1.
, the join will not be rewritten.a = 1 -
The joined columns must be of numeric types (integer or double).
Example
Consider two tables, t1
and t2
, both with an ID
column of high cardinality.
Assume the cardinalities of the ID columns are as follows:
-
t1.
has distinct values (1-20M) thus has cardinality 20,000,000ID -
t2.
has distinct values (1-40M) thus has cardinality 40,000,000ID
The following query joins tables t1
and t2
on the high-cardinality ID
columns.
SELECT *FROM t2JOIN t1on t2.ID = t1.ID;
To enable join memory reduction optimization for this query, use the following commands.
SET SESSION cardinality_threshold_for_join_split_rewrite = 12000000;SET SESSION target_group_size_for_join_split_rewrite = 10000000;
Setting cardinality_
indicates that if a join column has a cardinality greater than 12,000,000
, the original query should be rewritten into smaller joins.t1.
and t2.
have a cardinality greater than 12,000,000
, so the engine will apply the join memory reduction optimization to this query.12,000,000
, so the threshold is met.
Setting target_
enables the join memory reduction optimization and specifies the size of the inputs to the smaller joins (also called the groups).
With these settings, the previous query will be rewritten as follows:
(SELECT *FROM t2JOIN t1ON t2.ID = t1.ID AND t1.ID<10000000)UNION ALL(SELECT *FROM t2JOIN t1ON t2.ID = t1.ID AND t1.ID >= 1000000);
The single join is rewritten into two smaller joins which are combined with a UNION ALL
.
The smaller joins are created by adding a filter on the t1.
column.t1.
column has 20,000,000 distinct values (a cardinality of 20,000,000).target_
using the filters t1.
and t1.
.
The column with the smaller cardinality, t1.
in this example, is used to create the groups.
Last modified: July 23, 2025