Disk Spilling

SingleStore supports disk spilling to allow queries with large intermediate results to complete even if there is not enough memory available to hold these results.

Understanding Disk Spilling

In SingleStore, the local disk space on each leaf node is typically shared among the database logs, snapshots, plan cache, trace log, blob cache, disk spilling etc. Among all these, the blob cache and disk spilling are the two most significant in terms of space usage. While the blob cache in an unlimited storage deployment is configured to use most of the disk space, disk spilling from query execution is the most dynamic and the amount of spillage can vary widely depending on the queries, data volumes and data characteristics, memory configuration and space availability.

Users cannot configure where in memory the spilling occurs. SingleStore's disk manager is built to balance the requirements of both disk spilling and the blob cache for optimal disk space usage and performance. Its default heuristic configuration allows disk spilling to temporarily steal a limited amount of space from the blob cache without any noticeable impact on the blob cache performance.  In case of some special requirements for a workload, it allows users to manually set how much space disk spilling can temporarily steal from the blob cache by setting the global variable spilling_maximum_disk_percent.

Benefits

  • Helps prevent query failure due to memory constraints: When the memory load on a node exceeds a certain threshold and in-memory data structures reach a significant size (100 MB in this context), disk spilling enables the continuation of query execution by moving parts of the process to disk.

  • Handles heavy operations: It is useful when processing operations such as GROUP BY, SORT/ORDER BY, window functions, and hash joins, which are prone to use a significant amount of memory.

  • Manages query operations that exceed the maximum memory: It is particularly relevant when operations are at risk of exceeding the maximum memory limits set for nodes. By spilling to disk, it alleviates the memory load, although at the cost of potentially slower query execution.

Operators that may generate large internal structures

  • GROUP BY

  • SELECT DISTINCT

  • DISTINCT Aggregates

  • Hash Joins - INNER JOIN and LEFT OUTER JOIN

  • Sort

  • Window functions

If there is not enough memory for these operations then they will spill, with some exceptions. If spilling is not supported and there is insufficient memory, the query will terminate with an out-of-memory error.

All streaming operations can be done without spilling even in low-memory conditions since they have small memory requirements. These include: scans, filters, projects, ordered scans, merge join, nested loop join, gather, repartition, broadcast

Configuring Disk Spilling

SPILLING_MAXIMUM_DISK_PERCENT

The global variable spilling_maximum_disk_percent controls the percentage of disk space that disk spilling can steal from the blob cache during query execution.

Its default value is -1 which uses a heuristic setting when a query spilling needs to steal space from the blob cache.  It allows spilling to steal from the blob cache as long as the blob cache size stays within 70% of the maximum_blob_cache_size_mb variable value. This default configuration is designed to balance the requirements of disk space for both the blob cache and disk spilling, ensuring optimal performance and space utilization under typical scenarios.

It can be set to any value within the range 0 and 1 (both inclusive).

Setting the spilling_maximum_disk_percent = 0 turns off query spilling support. A query that needs to spill will return an error message: "Spilling disk usage (%f MB) exceeds the limit (%f MB) set through global variable 'spilling_maximum_disk_percent'(%f). Spilling is aborted. New request size: %lu bytes".

When spilling_maximum_disk_percent is set in the range > 0 and <=1 queries that need to spill will keep spilling as long as the spilling size is within the limit of (total disk size - minimal_disk_space) * spilling_maximum_disk_percent. This setting may allow spilling to push the blob cache to a much lower size than the typical 70% of maximum_blob_cache_size_mb as long as the disk usage by spilling does not hit any other existing limits like spilling_minimal_disk_space or minimal_disk_space.

Setting spilling_maximum_disk_percent = 1 is not recommended because it allows spilling to use any free disk space until the usable free space hits spilling_minimal_disk_space.  After that, any new queries that need to spill or running queries that need to spill more will return an out-of-disk error.

SingleStore recommends to leave the spilling_maximum_disk_percent at the default value.

memsql> SHOW GLOBAL VARIABLES LIKE 'spilling%disk%';
*
+-------------------------------------------------------+-----------+
| Variable_name                                         | Value     |
+-------------------------------------------------------+-----------+  
| spilling_maximum_disk_percent                         | -1.000000 |
+-------------------------------------------------------+-----------+

Disk Spilling and Resource Pool Memory Limits

Both QUERY_MEMORY_PERCENTAGE and MEMORY_PERCENTAGE from the resource pool influence disk spilling. Disk spilling is triggered when either of the limits set for QUERY_MEMORY_PERCENTAGE or MEMORY_PERCENTAGE is reached.

An operator is considered for disk spilling when its memory consumption is above spilling_query_operator_memory_threshold * min(QUERY_MEMORY_PERCENTAGE, MEMORY_PERCENTAGE).

When the resource pool is explicitly specified, then the resource pool level spilling thresholds for queries will be calculated by multiplying the MEMORY_PERCENTAGE and QUERY_MEMORY_PERCENTAGE with the two existing thresholds. For example, if MEMORY_PERCENTAGE = 30% and QUERY_MEMORY_PERCENTAGE = 20% then,

  • the memory threshold for queries in the pool to start spilling is when the memory usage of the pool reaches spilling_node_memory_threshold_ration * 30% OR

    the memory usage of the current query in the pool reaches spilling_node_memory_threshold_ratio * 20%.

  • the memory threshold for an operator to be considered for spilling will be when its memory consumption is above  spilling_query_operator_memory_threshold * min(20%, 30%).

When the resource pool is not explicitly specified, the default resource pool is  MEMORY_PERCENTAGE = 100% and QUERY_MEMORY_PERCENTAGE = 100%. It means queries in the pool can use up to the leaf node’s entire memory. In this case, multiplying spilling_node_memory_threshold_ratio and spilling_query_operator_memory_threshold by 100% is the same as considering the thresholds at the node level.

Use the command, SHOW STATUS [EXTENDED] to display information about disk spilling metrics that exist at the time when this command is run. For example, it shows the current spilling size with 'Disk_spilling_usage_mb'  and how much blob cache space (in MB) is stolen by disk spilling with 'Disk_spilling_evict_blob_cache_mb_average'.

memsql> SHOW STATUS EXTENDED LIKE 'Disk%';
*
+-------------------------------------------+--------+
| Variable_name                             | Value  |
+-------------------------------------------+--------+
| Disk_spilling_usage_limit_mb              | 86608  |
| Disk_spilling_usage_mb                    | 0      |
| Disk_spilling_evict_blob_cache_mb_average | 0      |
| Disk_spilling_usage_mb_high_water_mark    | 0      |
| Disk_total_size_mb                        | 189008 |
| Disk_available_size_mb                    | 49708  |
| Disk_uncategorized_usage_mb               | 139200 |
+-------------------------------------------+--------+
memsql> SHOW STATUS LIKE '%blob%';
+-----------------------------------------------------------+-------+
| Variable_name                                             | Value |
+-----------------------------------------------------------+-------+
| Blob_cache_max_size_mb_adjusted_for_low_disk              | 86608 |
| Blob_cache_max_size_mb_adjusted_for_low_disk_and_spilling | 86608 |
+-----------------------------------------------------------+-------+

As the name indicates, Blob_cache_max_size_mb_adjusted_for_low_disk is the blob cache maximum size in MB after the configured maximum_blob_cache_size_mb is adjusted on low disk condition. 

Blob_cache_max_size_mb_adjusted_for_low_disk_and_spilling is the effective maximum blob cache size after taking both the low disk adjustment and spilling impact into account.

For an explanation of each metric refer to SHOW STATUS EXTENDED

Disk Spilling Limitations and Workarounds

Some operations that are currently not supported by disk spilling are described below, along with specific possible solutions where applicable.

Right Outer Join

All queries that use the RIGHT OUTER JOIN syntax are internally converted to left outer join operations early during query optimization. However, sometimes the query optimizer may convert a left outer join to a right outer join internally.

To work around the right join spilling limit, you can disable the internal rewrite that turns a left join into a right join by setting the optimizer_disable_right_join variable to ON.

SET OPTIMIZER_DISABLE_RIGHT_JOIN = ON;
<query> 
SET OPTIMIZER_DISABLE_RIGHT_JOIN = DEFAULT;

Full Outer Join

For full outer join, the workaround involves "divide and conquer"

  A FULL OUTER JOIN B
can be
A where join key not in B – padded with nulls on B side
UNION ALL
A INNER JOIN B
UNION ALL
B where join key not in A – padded with nulls on B side

Join/Group by/Sort on top of hash join

In a query with join or group by or sort on top of a hash join, the hash join at the bottom can not be spilled.

For example:

SELECT i.date, sum(i.amount), count(i.amount) 
FROM  invoices i JOIN invoicepayments ip
ON i.invoiceid = ip.invoiceid 
WHERE c.customernumber = '100'GROUP BY i.date;

The join is not able to spill. We can spill the hash group-by operation in this case.

Window Functions without Partition Clause or Large Partitions

Window function spilling works at the partition level wherein a few partitions are kept in memory while spilling other partitions to disk. However, if there is a single partition that does not fit in memory, window function spilling will not happen. This includes the case where a window function is run on an entire row set without partitioning (with no PARTITION BY clause).

Runtime Types Limitations for Spilling

Spilling is not supported for full-text search (FTS) pseudo-columns.

MIN, MAX, SUM and COUNT aggregates are supported for all data types except Geographic data type.

SUM and AVG are supported on all the numeric data types. Other aggregate functions are not supported.

Spilling is not supported for Geographic data types, nor any aggregate functions on Geographic data.

Disk Spilling Errors

Some typical error conditions that may arise are:

  • The disk available size is below the value of spilling_minimal_disk_space.

  • The total spilling size exceeds the disk_spilling_usage_limit_mb value.

  • The columnstore blob cache is low during database replication.

To know about the error conditions and messages refer to Disk Spilling Errors.

Last modified: May 15, 2024

Was this article helpful?