Disk Spilling
On this page
SingleStore supports disk spilling to allow queries with large intermediate results to complete even if there is not enough memory available to hold these results.
Understanding Disk Spilling
In SingleStore, the local disk space on each leaf node is typically shared among the database logs, snapshots, plan cache, trace log, blob cache, disk spilling etc.
Users cannot configure where in memory the spilling occurs.spilling_
.
Benefits
-
Helps prevent query failure due to memory constraints: When the memory load on a node exceeds a certain threshold and in-memory data structures reach a significant size (100 MB in this context), disk spilling enables the continuation of query execution by moving parts of the process to disk.
-
Handles heavy operations: It is useful when processing operations such as
GROUP BY
,SORT/ORDER BY
, window functions, and hash joins, which are prone to use a significant amount of memory. -
Manages query operations that exceed the maximum memory: It is particularly relevant when operations are at risk of exceeding the maximum memory limits set for nodes.
By spilling to disk, it alleviates the memory load, although at the cost of potentially slower query execution.
Operators that may generate large internal structures
-
GROUP BY
-
SELECT DISTINCT
-
DISTINCT
Aggregates -
Hash Joins -
INNER JOIN
andLEFT OUTER JOIN
-
Sort
-
Window functions
If there is not enough memory for these operations then they will spill, with some exceptions.
All streaming operations can be done without spilling even in low-memory conditions since they have small memory requirements.
Configuring Disk Spilling
SPILLING_
The global variable spilling_
controls the percentage of disk space that disk spilling can steal from the blob cache during query execution.
Its default value is -1 which uses a heuristic to address the situation in which query spilling needs to use space from the blob cache.
It can be set to any value within the range 0 and 1 (both inclusive).
Setting the spilling_
= 0 turns off query spilling support.
When spilling_
is set in the range > 0 and <=1 queries that need to spill will keep spilling as long as the spilling size is within the limit of (total disk size - minimal_
) * spilling_
.maximum_
as long as the disk usage by spilling does not hit any other existing limits like spilling_
or minimal_
.
Setting spilling_
= 1 is not recommended because it allows spilling to use any free disk space until the usable free space hits spilling_
.
SingleStore recommends to leave the spilling_
at the default value.
memsql> SHOW GLOBAL VARIABLES LIKE 'spilling%disk%';
*
+-------------------------------------------------------+-----------+
| Variable_name | Value |
+-------------------------------------------------------+-----------+
| spilling_maximum_disk_percent | -1.000000 |
+-------------------------------------------------------+-----------+
Disk Spilling and Resource Pool Memory Limits
Both QUERY_
and MEMORY_
from the resource pool influence disk spilling.QUERY_
or MEMORY_
is reached.
An operator is considered for disk spilling when its memory consumption is above spilling_
.
When the resource pool is explicitly specified, then the resource pool level spilling thresholds for queries will be calculated by multiplying the MEMORY_
and QUERY_
with the two existing thresholds.MEMORY_
= 30% and QUERY_
= 20% then,
-
the memory threshold for queries in the pool to start spilling is when the memory usage of the pool reaches
spilling_
* 30% ORnode_ memory_ threshold_ ration the memory usage of the current query in the pool reaches
spilling_
* 20%.node_ memory_ threshold_ ratio -
the memory threshold for an operator to be considered for spilling will be when its memory consumption is above
spilling_
* min(20%, 30%).query_ operator_ memory_ threshold
When the resource pool is not explicitly specified, the default resource pool is MEMORY_
= 100% and QUERY_
= 100%.spilling_
and spilling_
by 100% is the same as considering the thresholds at the node level.
Displaying Disk Spilling Related Metrics
Use the command, SHOW STATUS [EXTENDED]
to display information about disk spilling metrics that exist at the time when this command is run.Disk_
' and how much blob cache space (in MB) is stolen by disk spilling with 'Disk_
'.
memsql> SHOW STATUS EXTENDED LIKE 'Disk%';
*
+-------------------------------------------+--------+
| Variable_name | Value |
+-------------------------------------------+--------+
| Disk_spilling_usage_limit_mb | 86608 |
| Disk_spilling_usage_mb | 0 |
| Disk_spilling_evict_blob_cache_mb_average | 0 |
| Disk_spilling_usage_mb_high_water_mark | 0 |
| Disk_total_size_mb | 189008 |
| Disk_available_size_mb | 49708 |
| Disk_uncategorized_usage_mb | 139200 |
+-------------------------------------------+--------+
memsql> SHOW STATUS LIKE '%blob%';
+-----------------------------------------------------------+-------+
| Variable_name | Value |
+-----------------------------------------------------------+-------+
| Blob_cache_max_size_mb_adjusted_for_low_disk | 86608 |
| Blob_cache_max_size_mb_adjusted_for_low_disk_and_spilling | 86608 |
+-----------------------------------------------------------+-------+
As the name indicates, Blob_
is the blob cache maximum size in MB after the configured maximum_
is adjusted on low disk condition.
Blob_
is the effective maximum blob cache size after taking both the low disk adjustment and spilling impact into account.
For an explanation of each metric refer to SHOW STATUS EXTENDED
Disk Spilling Limitations and Workarounds
Some operations that are currently not supported by disk spilling are described below, along with specific possible solutions where applicable.
Right Outer Join
All queries that use the RIGHT OUTER JOIN
syntax are internally converted to left outer join operations early during query optimization.
To work around the right join spilling limit, you can disable the internal rewrite that turns a left join into a right join by setting the optimizer_
variable to ON.
The following example disables this internal rewrite for a single query, <query>
.
SET OPTIMIZER_DISABLE_RIGHT_JOIN = ON;<query>SET OPTIMIZER_DISABLE_RIGHT_JOIN = DEFAULT;
Full Outer Join
For full outer join, the workaround involves "divide and conquer".A FULL OUTER JOIN B
can be done as follows:
A where join key not in B – padded with nulls on B sideUNION ALLA INNER JOIN BUNION ALLB where join key not in A – padded with nulls on B side
Join/Group by/Sort on top of hash join
In a query with join or group by or sort on top of a hash join, the hash join at the bottom can not be spilled.
For example:
SELECT i.date, sum(i.amount), count(i.amount)FROM invoices i JOIN invoicepayments ipON i.invoiceid = ip.invoiceidWHERE c.customernumber = '100'GROUP BY i.date;
The join is not able to spill.
Window Functions without Partition Clause or Large Partitions
Window function spilling works at the partition level wherein a few partitions are kept in memory while spilling other partitions to disk.PARTITION BY
clause).
Runtime Type Limitations for Spilling
Spilling is not supported for full-text search (FTS) pseudo-columns.
MIN
, MAX
, SUM
and COUNT
aggregates are supported for all data types except Geographic data type.
SUM
and AVG
are supported on all the numeric data types.
Spilling is not supported for Geographic data types, nor any aggregate functions on Geographic data.
Disk Spilling Errors
Some typical error conditions that may arise are:
-
The disk available size is below the value of
spilling_
.minimal_ disk_ space -
The total spilling size exceeds the
disk_
value.spilling_ usage_ limit_ mb -
The columnstore blob cache is low during database replication.
To know about the error conditions and messages refer to Disk Spilling Errors.
Last modified: November 8, 2024