# Disk Spilling

SingleStore supports disk spilling to allow queries with large intermediate results to complete even if there is not enough memory available to hold these results.

## Understanding Disk Spilling

In SingleStore, the local disk space on each leaf node is typically shared among the database logs, snapshots, plan cache, trace log, blob cache, disk spilling etc. Among all these, the blob cache and disk spilling are the two most significant in terms of space usage. While the blob cache in an unlimited storage deployment is configured to use most of the disk space, disk spilling from query execution is the most dynamic and the amount of spillage can vary widely depending on the queries, data volumes and data characteristics, memory configuration and space availability.

Users cannot configure where in memory the spilling occurs. SingleStore's disk manager is built to balance the requirements of both disk spilling and the blob cache for optimal disk space usage and performance. Its default heuristic configuration allows disk spilling to temporarily steal a limited amount of space from the blob cache without any noticeable impact on the blob cache performance.  In case of some special requirements for a workload, it allows users to manually set how much space disk spilling can temporarily steal from the blob cache by setting the global variable `spilling_maximum_disk_percent`.&#x20;

**Benefits**

* Helps prevent query failure due to memory constraints: When the memory load on a node exceeds a certain threshold and in-memory data structures reach a significant size (100 MB in this context), disk spilling enables the continuation of query execution by moving parts of the process to disk.
* Handles heavy operations: It is useful when processing operations such as `GROUP BY`, `SORT/ORDER BY`, window functions, and hash joins, which are prone to use a significant amount of memory.
* Manages query operations that exceed the maximum memory: It is particularly relevant when operations are at risk of exceeding the maximum memory limits set for nodes. By spilling to disk, it alleviates the memory load, although at the cost of potentially slower query execution.

**Operators that may generate large internal structures**

* `GROUP BY`
* `SELECT DISTINCT`
* `DISTINCT` Aggregates
* Hash Joins -  `INNER JOIN`, `LEFT OUTER JOIN`, `RIGHT OUTER JOIN`, `FULL OUTER JOIN`
* Sort
* Window functions

If there is not enough memory for these operations then they will spill, with some exceptions. If spilling is not supported and there is insufficient memory, the query will terminate with an out-of-memory error.

All streaming operations can be done without spilling even in low-memory conditions since they have small memory requirements. These include:  scans, filters, projects, ordered scans, merge join, nested loop join, gather, repartition, broadcast

## Configuring Disk Spilling

**SPILLING\_MAXIMUM\_DISK\_PERCENT**

The global variable `spilling_maximum_disk_percent` controls the percentage of disk space that disk spilling can steal from the blob cache during query execution.&#x20;

Its default value is `-1` which uses a heuristic to address the situation in which query spilling needs to use space from the blob cache. The blob cache is allowed to grow or shrink within the limit specified by `maximum_blob_cache_size_mb`; in certain situations, it makes sense for spilling to use a limited amount of blob cache space. Under the default heuristic, spilling will not use space from the blob cache until the free usable space hits a low threshold. After this threshold is reached, the heuristic allows spilling to use up to 30% of the value of `maximum_blob_cache_size_mb` from the blob cache. Thus, the maximum amount of blob cache space that spilling can potentially use is 30%. The default configuration is designed to balance the requirements of disk space for both the blob cache and disk spilling, ensuring optimal performance and space utilization under typical scenarios.

It can be set to any value within the range 0 and 1 (both inclusive).

Setting the `spilling_maximum_disk_percent` = 0 turns off query spilling support. A query that needs to spill will return an error message: "*Spilling disk usage (%f MB) exceeds the limit (%f MB) set through global variable 'spilling\_maximum\_disk\_percent'(%f). Spilling is aborted. New request size: %lu bytes*".

When `spilling_maximum_disk_percent` is set in the range > 0 and <=1 queries that need to spill will keep spilling as long as the spilling size is within the limit of (total disk size - `minimal_disk_space`) \* `spilling_maximum_disk_percent`. This setting may allow spilling to push the blob cache to a much lower size than the typical 70% of `maximum_blob_cache_size_mb` as long as the disk usage by spilling does not hit any other existing limits like `spilling_minimal_disk_space` or `minimal_disk_space`.

Setting `spilling_maximum_disk_percent` = 1 is not recommended because it allows spilling to use any free disk space until the usable free space hits `spilling_minimal_disk_space`.  After that, any new queries that need to spill or running queries that need to spill more will return an out-of-disk error.

SingleStore recommends to leave the `spilling_maximum_disk_percent` at the default value.

```sql
memsql> SHOW GLOBAL VARIABLES LIKE 'spilling%disk%';


```

```output
*
+-------------------------------------------------------+-----------+
| Variable_name                                         | Value     |
+-------------------------------------------------------+-----------+  
| spilling_maximum_disk_percent                         | -1.000000 |
+-------------------------------------------------------+-----------+
```

**Disk Spilling and Resource Pool Memory Limits**

Both `QUERY_MEMORY_PERCENTAGE` and `MEMORY_PERCENTAGE` from the resource pool influence disk spilling. Disk spilling is triggered when either of the limits set for `QUERY_MEMORY_PERCENTAGE` or `MEMORY_PERCENTAGE` is reached.

An operator is considered for disk spilling when its memory consumption is above `spilling_query_operator_memory_threshold * min(QUERY_MEMORY_PERCENTAGE, MEMORY_PERCENTAGE)`.

When the resource pool is explicitly specified, then the resource pool level spilling thresholds for queries will be calculated by multiplying the `MEMORY_PERCENTAGE` and `QUERY_MEMORY_PERCENTAGE` with the two existing thresholds. For example, if `MEMORY_PERCENTAGE` = 30% and `QUERY_MEMORY_PERCENTAGE `= 20% then,

* The memory threshold for *queries in the pool* to start spilling is when the memory usage of the pool reaches `spilling_node_memory_threshold_ration` \* 30% OR

  the memory usage of *the current query in the pool* reaches `spilling_node_memory_threshold_ratio` \* 20%.
* The memory threshold for an operator to be considered for spilling will be when its memory consumption is above  `spilling_query_operator_memory_threshold`  \* min(20%, 30%).

When the resource pool is not explicitly specified, the default resource pool is  `MEMORY_PERCENTAGE` = 100% and `QUERY_MEMORY_PERCENTAGE` = 100%. It means queries in the pool can use up to the leaf node’s entire memory. In this case, multiplying `spilling_node_memory_threshold_ratio` and `spilling_query_operator_memory_threshold` by 100% is the same as considering the thresholds at the node level.

## Displaying Disk Spilling Related Metrics

Use the command, `SHOW STATUS [EXTENDED]` to display information about disk spilling metrics that exist at the time when this command is run. For example, it shows the current spilling size with '`Disk_spilling_usage_mb`'  and how much blob cache space (in MB) is stolen by disk spilling with '`Disk_spilling_evict_blob_cache_mb_average`'.

```sql
memsql> SHOW STATUS EXTENDED LIKE 'Disk%'; 


```

```output
*
+-------------------------------------------+--------+
| Variable_name                             | Value  |
+-------------------------------------------+--------+
| Disk_spilling_usage_limit_mb              | 86608  |
| Disk_spilling_usage_mb                    | 0      |
| Disk_spilling_evict_blob_cache_mb_average | 0      |
| Disk_spilling_usage_mb_high_water_mark    | 0      |
| Disk_total_size_mb                        | 189008 |
| Disk_available_size_mb                    | 49708  |
| Disk_uncategorized_usage_mb               | 139200 |
+-------------------------------------------+--------+
```

```sql
memsql> SHOW STATUS LIKE '%blob%';


```

```output

+-----------------------------------------------------------+-------+
| Variable_name                                             | Value |
+-----------------------------------------------------------+-------+
| Blob_cache_max_size_mb_adjusted_for_low_disk              | 86608 |
| Blob_cache_max_size_mb_adjusted_for_low_disk_and_spilling | 86608 |
+-----------------------------------------------------------+-------+
```

As the name indicates, `Blob_cache_max_size_mb_adjusted_for_low_disk` is the blob cache maximum size in MB after the configured `maximum_blob_cache_size_mb` is adjusted on low disk condition. &#x20;

`Blob_cache_max_size_mb_adjusted_for_low_disk_and_spilling` is the effective maximum blob cache size after taking both the low disk adjustment and spilling impact into account.

For an explanation of each metric refer to `SHOW STATUS EXTENDED`

## Disk Spilling Limitations and Workarounds

Some operations that are currently not supported by disk spilling are described below, along with specific possible solutions where applicable.

## Join/Group by/Sort on top of hash join

In a query with join or group by or sort on top of a hash join, the hash join at the bottom can not be spilled.

For example:

```sql
SELECT i.date, sum(i.amount), count(i.amount) 
    FROM  invoices i JOIN invoicepayments ip 
        ON i.invoiceid = ip.invoiceid 
            WHERE c.customernumber = '100'GROUP BY i.date;
```

The join is not able to spill. We can spill the hash group-by operation in this case.

## Window Functions without Partition Clause or Large Partitions

Window function spilling works at the partition level wherein a few partitions are kept in memory while spilling other partitions to disk. However, if there is a single partition that does not fit in memory, window function spilling will not happen. This includes the case where a window function is run on an entire row set without partitioning (with no `PARTITION BY` clause).

## Runtime Type Limitations for Spilling

Spilling is not supported for full-text search (FTS) pseudo-columns.

`MIN`, `MAX`, `SUM` and `COUNT` aggregates are supported for all data types except Geographic data type.

`SUM` and `AVG` are supported on all the numeric data types. Other aggregate functions are not supported.

Spilling is not supported for Geographic data types, nor any aggregate functions on Geographic data.

## Disk Spilling Errors

Some typical error conditions that may arise are:

* The disk available size is below the value of `spilling_minimal_disk_space`.
* The total spilling size exceeds the `disk_spilling_usage_limit_mb` value.
* The columnstore blob cache is low during database replication.

To know about the error conditions and messages refer to [Disk Spilling Errors](https://docs.singlestore.com/db/v9.1/reference/troubleshooting-reference/disk-spilling-errors.md).

***

Modified at: September 9, 2025

Source: [/db/v9.1/user-and-cluster-administration/maintain-your-cluster/managing-memory/disk-spilling/](https://docs.singlestore.com/db/v9.1/user-and-cluster-administration/maintain-your-cluster/managing-memory/disk-spilling/)

(An index of the documentation is available at /llms.txt)
