Join Memory Reduction Optimization

The join memory reduction optimization is a new query optimization feature designed to help prevent memory overflow (spilling) during hash joins on large datasets. When enabled, it automatically rewrites qualifying joins into smaller, manageable pieces—reducing memory pressure and the risk of out-of-memory (OOM) errors.

This feature is especially useful when joining large tables on high-cardinality columns, where traditional hash joins are likely to exceed memory limits.

The join memory reduction optimization is built into the query optimizer. Queries with a hash join on a column with a cardinality exceeding the specified threshold are split into several smaller join queries. The results of the smaller join queries are recombined using UNION ALL.

Note

The join memory reduction optimization introduces overhead, which can negatively impact performance. SingleStore recommends using the join memory reduction optimization only when necessary, for example, to prevent out-of-memory (OOM) errors.

Configure Join Memory Reduction Optimization

The join memory reduction optimization splits large hash joins into smaller joins to avoid memory overflow.

This optimization is configured by specifying:

The cardinality threshold at which the optimization will be triggered.
The size of the smaller joins (groups) into which the larger hash join should be split.

The two engine variables used to configure these two values are:

cardinality_threshold_for_join_split_rewrite

Specifies the cardinality threshold for the join memory reduction optimization. If all relevant join columns have a cardinality above this threshold, the JOIN query will be split into smaller join queries.

A value of LLONG_MAX (the default) indicates that the optimization is disabled.

target_group_size_for_join_split_rewrite

Specifies the maximum size of the input (in rows) to the smaller join queries, also called the group size.

To enable the join memory reduction optimization, set the cardinality_threshold_for_join_split engine variable to a value other than LLONG_MAX.

To disable the join memory reduction optimization, set the cardinality_threshold_for_join_split engine variable to LLONG_MAX.

Select the Engine Variable Values

`cardinality_threshold_for_join_split_rewrite`

When both join columns have cardinality greater than cardinality_threshold_for_join_split_rewrite, the hash join will be rewritten into smaller joins.
Choose the value of this variable based on the cardinality threshold beyond which the original query is expected to encounter OOM errors.
Proper selection ensures that the optimization is only triggered when truly needed, preventing unnecessary performance degradation.

Use the following command to set this variable's value.

SQL

SET SESSION cardinality_threshold_for_join_split_rewrite = <value>

`target_group_size_for_join_split_rewrite`

The goal for the value of this variable is to strike a balance between minimizing the number of splits and ensuring that each split can be efficiently processed without memory issues.
SingleStore recommends using the following formula to choose a value for target_group_size_for_join_split_rewrite.
- target_group_size_for_join_split_rewrite = cardinality of the joined column with the lowest cardinality / N
- N is the number of splits
Choose N to be as small as possible (e.g., 2) while ensuring that the target_group_size_for_join_split_rewrite remains manageable within the available memory.
The target_group_size_for_join_split_rewrite must be set to a value below 10,000,000 to avoid unnecessary overhead.

Use the following command to set this variable's value.

SQL

SET SESSION target_group_size_for_join_split_rewrite = <value>

Remarks

The following apply to the join memory reduction optimization:

The tables must be columnstore tables.
The tables must have column statistics available.
A histogram on the high cardinality column must exist.
The join condition on the high cardinality column must be an equality condition on two columns, for example: t1.a = t2.b.
The join condition on the high cardinality column cannot include an inequality. For example, if the join condition is t1.a < t2.b , the join will not be rewritten.
The join condition on the high cardinality column cannot include a constant value or expressions. For example, if the join condition is t1.a = 1, the join will not be rewritten.
The joined columns must be of numeric types (integer or double).

Example

Consider two tables, t1 and t2, both with an ID column of high cardinality.

Assume the cardinalities of the ID columns are as follows:

t1.ID has distinct values (1-20M) thus has cardinality 20,000,000
t2.ID has distinct values (1-40M) thus has cardinality 40,000,000

The following query joins tables t1 and t2 on the high-cardinality ID columns. This join is a candidate for the join memory reduction optimization.

SQL

SELECT *
FROM t2
JOIN t1
on t2.ID = t1.ID;

To enable join memory reduction optimization for this query, use the following commands.

SQL

SET SESSION cardinality_threshold_for_join_split_rewrite = 12000000;
SET SESSION target_group_size_for_join_split_rewrite = 10000000;

Setting cardinality_threshold_for_join_split_rewrite = 12000000 indicates that if a join column has a cardinality greater than 12,000,000, the original query should be rewritten into smaller joins. Both t1.ID and t2.ID have a cardinality greater than 12,000,000, so the engine will apply the join memory reduction optimization to this query. Note that both columns have cardinality greater than 12,000,000, so the threshold is met.

Setting target_group_size_for_join_split_rewrite = 10000000 enables the join memory reduction optimization and specifies the size of the inputs to the smaller joins (also called the groups).

With these settings, the previous query will be rewritten as follows:

SQL

(SELECT *
 FROM t2
 JOIN t1
 ON t2.ID = t1.ID AND t1.ID<10000000
)

UNION ALL

(SELECT *
 FROM t2
 JOIN t1
 ON t2.ID  = t1.ID AND t1.ID >= 1000000
);

The single join is rewritten into two smaller joins which are combined with a UNION ALL.

The smaller joins are created by adding a filter on the t1.ID column. The t1.ID column has 20,000,000 distinct values (a cardinality of 20,000,000). This column is used to break the query into two groups of size target_group_size_for_join_split_rewrite using the filters t1.ID < 10000000 and t1.ID >= 10000001.

The column with the smaller cardinality, t1.ID in this example, is used to create the groups.

Join Memory Reduction Optimization

On this page

Configure Join Memory Reduction Optimization

Select the Engine Variable Values

`cardinality_threshold_for_join_split_rewrite`

`target_group_size_for_join_split_rewrite`

Remarks

Example

Was this article helpful?

On this page

Was this article helpful?