# Join Memory Reduction Optimization

The join memory reduction optimization is a new query optimization feature designed to help prevent memory overflow (spilling) during hash joins on large datasets. When enabled, it automatically rewrites qualifying joins into smaller, manageable pieces—reducing memory pressure and the risk of out-of-memory (OOM) errors.

This feature is especially useful when joining large tables on high-cardinality columns, where traditional hash joins are likely to exceed memory limits.

The join memory reduction optimization is built into the query optimizer. Queries with a hash join on a column with a cardinality exceeding the specified threshold are split into several smaller join queries. The results of the smaller join queries are recombined using `UNION ALL`.

> **📝 Note**: The join memory reduction optimization introduces overhead, which can negatively impact performance. SingleStore recommends using the join memory reduction optimization only when necessary, for example, to prevent out-of-memory (OOM) errors.

## Configure Join Memory Reduction Optimization

The join memory reduction optimization splits large hash joins into smaller joins to avoid memory overflow.

This optimization is configured by specifying:

* The cardinality threshold at which the optimization will be triggered.
* The size of the smaller joins (groups) into which the larger hash join should be split.

The two engine variables used to configure these two values are:

| `cardinality_threshold_for_join_split_rewrite` | Specifies the cardinality threshold for the join memory reduction optimization. If all relevant join columns have a cardinality above this threshold, the`JOIN`query will be split into smaller join queries.A value of`LLONG_MAX`(the default) indicates that the optimization is disabled. |
| ---------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `target_group_size_for_join_split_rewrite`     | Specifies the maximum size of the input (in rows) to the smaller join queries, also called the group size.                                                                                                                                                                                   |

To enable the join memory reduction optimization, set the `cardinality_threshold_for_join_split_rewrite` engine variable to a value other than `LLONG_MAX`.

To disable the join memory reduction optimization, set the `cardinality_threshold_for_join_split_rewrite` engine variable to`LLONG_MAX`(`9223372036854775807`).

## Select the Engine Variable Values

## `cardinality_threshold_for_join_split_rewrite`

* When both join columns have cardinality greater than `cardinality_threshold_for_join_split_rewrite`, the hash join will be rewritten into smaller joins.
* Choose the value of this variable based on the cardinality threshold beyond which the original query is expected to encounter OOM errors.
* Proper selection ensures that the optimization is only triggered when truly needed, preventing unnecessary performance degradation.

Use the following command to set this variable's value.

```sql
SET SESSION cardinality_threshold_for_join_split_rewrite = <value>
```

## `target_group_size_for_join_split_rewrite`

* The goal for the value of this variable is to strike a balance between minimizing the number of splits and ensuring that each split can be efficiently processed without memory issues.
* SingleStore recommends using the following formula to choose a value for `target_group_size_for_join_split_rewrite`.

  * `target_group_size_for_join_split_rewrite` = cardinality of the joined column with the lowest cardinality / `N`
  * `N` is the number of splits
* Choose `N` to be as small as possible (e.g., 2) while ensuring that the `target_group_size_for_join_split_rewrite` remains manageable within the available memory.
* The `target_group_size_for_join_split_rewrite` must be set to a value above 10,000,000 to avoid unnecessary overhead.

Use the following command to set this variable's value.

```sql
SET SESSION target_group_size_for_join_split_rewrite = <value>
```

## Remarks

The following apply to the join memory reduction optimization:

* The tables must be columnstore tables.
* The tables must have column statistics available.
* A histogram on the high cardinality column must exist.
* The join condition on the high cardinality column must be an equality condition on two columns, for example: `t1.a = t2.b`.
* The join condition on the high cardinality column cannot include an inequality. For example, if the join condition is `t1.a < t2.b` , the join will not be rewritten.
* The join condition on the high cardinality column cannot include a constant value or expressions. For example, if the join condition is `t1.a = 1`,  the join will not be rewritten.
* The joined columns must be of numeric types (integer or double).

## Example

Consider two tables, `t1` and `t2`, both with an `ID` column of high cardinality.

Assume the cardinalities of the ID columns are as follows:

* `t1.ID` has distinct values (1-20M) thus has cardinality 20,000,000
* `t2.ID` has distinct values (1-40M) thus has cardinality 40,000,000

The following query joins tables `t1` and `t2` on the high-cardinality `ID` columns. This join is a candidate for the join memory reduction optimization.

```sql
SELECT * 
FROM t2
JOIN t1
on t2.ID = t1.ID;
```

To enable join memory reduction optimization for this query, use the following commands.

```sql
SET SESSION cardinality_threshold_for_join_split_rewrite = 12000000;
SET SESSION target_group_size_for_join_split_rewrite = 10000000;
```

Setting `cardinality_threshold_for_join_split_rewrite = 12000000` indicates that if a join column has a cardinality greater than `12,000,000`, the original query should be rewritten into smaller joins. Both `t1.ID` and `t2.ID` have a cardinality greater than `12,000,000`, so the engine will apply the join memory reduction optimization to this query. Note that both columns have cardinality greater than `12,000,000`, so the threshold is met.

Setting `target_group_size_for_join_split_rewrite = 10000000` enables the join memory reduction optimization and specifies the size of the inputs to the smaller joins (also called the groups).

With these settings, the previous query will be rewritten as follows:

```sql
(SELECT * 
 FROM t2 
 JOIN t1 
 ON t2.ID = t1.ID AND t1.ID<10000000
)

UNION ALL

(SELECT * 
 FROM t2 
 JOIN t1 
 ON t2.ID  = t1.ID AND t1.ID >= 1000000
);
```

The single join is rewritten into two smaller joins which are combined with a `UNION ALL`.

The smaller joins are created by adding a filter on the `t1.ID` column. The `t1.ID` column has 20,000,000 distinct values (a cardinality of 20,000,000). This column is used to break the query into two groups of size `target_group_size_for_join_split_rewrite` using the filters `t1.ID < 10000000` and `t1.ID >= 10000001`.

The column with the smaller cardinality, `t1.ID` in this example, is used to create the groups.

***

Modified at: March 19, 2026

Source: [/db/v9.1/query-data/query-tuning/join-memory-reduction-optimization/](https://docs.singlestore.com/db/v9.1/query-data/query-tuning/join-memory-reduction-optimization/)

(An index of the documentation is available at /llms.txt)
