# Use the Workload Manager

SingleStore’s Workload Manager (WM) automatically manages cluster workloads by queuing and executing distributed queries based on available leaf connections, threads, and memory, so that queries only run when these resources are sufficiently available. It improves query execution efficiency and prevents workload surges from overwhelming the system. WM does not enforce CPU limits; CPU usage is controlled separately via resource pools (Resource Governance).

WM is concerned with:

* number of threads on leaves specified by the `workload_management_max_threads_per_leaf` variable.
* number of connections on each leaf specified by the `workload_management_max_connections_per_leaf` variable.
* leaf memory.

**Definition of the Variables Used by WM**

` 
      workload_management_max_threads_per_leaf
     `

The maximum number of threads to use per leaf. This number correlates with the `max_connection_threads` engine variable, and they should generally be set to the same value. This variable can sync to all aggregators.

` 
      workload_management_max_connections_per_leaf
     `

The maximum number of connections to use per leaf node in the cluster. This variable can sync to all aggregators.

` 
      max_connection_threads
     `

The maximum number of kernel threads for processing queries. For more, see [In-Depth Variable Definitions](https://docs.singlestore.com/db/v9.1/reference/configuration-reference/engine-variables/list-of-engine-variables/#in-depth-variable-definitions.md).

WM divides leaf resources among multiple aggregators. Assume a system with one leaf node and two aggregators. The  `workload_management_max_threads_per_leaf` variable is set to 1000, implying the total number of threads on the leaf node at any point of time is limited to 1K. Queries running on both aggregators will try to use the threads on the leaf and the WM will attempt to split this limit of 1K threads between the two aggregators based on the requirement of the queries on those two aggregators. If Aggregator1 is heavily loaded, the WM may decide to give 900 threads quota to Aggregator1 and 100 threads quota to Aggregator2.

To see these quotas, which change with time, on the specific node run:

```
SHOW WORKLOAD MANAGEMENT STATUS;
```

**`workload_management_min_sample_based_memory_estimation_runs`**

Minimum number of runs for workload management to use sample based rather than static row size based memory estimation. When disabled (set to 0), the first time query run will not be managed by workload management based on its memory.

In addition to splitting leaf resources among aggregators, WM also throttles the number of queries that can run on any aggregator. This is to ensure the current running queries do not exceed the quota on the aggregator. WM does this throttling by queuing queries. WM queues queries after Resource Governance.

**Differences between Workload Manager and Resource Governance**

* Related to workload management is the concept of resource governance. Unlike the Workload Manager which is built-in and enabled by default, you have to enable the resource governance. For more information, see [Set Resource Limits](https://docs.singlestore.com/db/v9.1/user-and-cluster-administration/use-the-workload-manager-and-set-resource-limits/set-resource-limits.md).
* In SingleStore’s query pipeline, if resource governance is enabled then resource governance given priority. WM queues queries after resource governance.
* Resource governance and workload management are two mutually exclusive features. The Workload Manager does not have access to the resource pool configurations and the memory limits imposed by them.
* The Workload Manager classifies a query based on estimated usage (connections, threads, memory) and determines if it can run or should be queued.
* The Resource Pool checks `MAX_CONCURRENCY`, `MAX_QUEUE_DEPTH` to decide whether a query can run or should be queued.
* Once a query starts executing, the Resource Pool enforces limits such as `MEMORY_PERCENTAGE`, `CPU_PERCENTAGE`, `QUERY_MEMORY_PERCENTAGE`, and `QUERY_TIMEOUT` to control actual resource usage and prevent overload.

## Query Lifecycle Flowchart

This is a flowchart of how a query lifecycle is spent when it interacts with the aggregators, passes through the resource pool, and then comes under WM.

![](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/blt5b8bef3d68a53259/6a2c4211f29ff52253b64984/Workload_query_lifecycle_flowchart-Jq0BZQ.png)

## Memory Management for Workload Manager

The following three approaches are used to calculate the memory cost for a query and leverage the memory cost for workload management.

* **Static row size based estimation**: For queries that do not have previous run stats, the memory cost is based on the estimated row count of each executor of the optimised query plan. This is multiplied by each executor’s projected row size and then summed up as the expected memory usage. The estimation can be found in the optimizer notes of the query’s corresponding plancache entry, for example: `"join_memory_costs_estimates":{"rows_in_memory": 1", bytes_in_memory": '8388616, "total_tables_saved_in_memory": 2}`.
* **Sample based estimation**: For non-first run of non-profile queries, task stats are periodically collected from the leaf nodes plancache and these samples are used to calculate an expected memory cost. However, in this approach, if the leaf nodes or aggregator’s plancache entry get evicted, then the calculation will not be accurate.
* **Profile based estimation**: For non-first run of profiled queries, task memory cost is collected during the profile stats collection phase and gathered by the aggregator after the query run finishes. Then the aggregated memory cost is used as the expected memory cost. Unlike the sample based estimation, this approach does not have the problem stated above, since the memory cost of only a specific run is used.

## Configure Workload Management

The workload manager can be configured using engine variables, and the default settings are sufficient for most users regardless of cluster size. However, these engine variables can be tuned to fully utilize system resources for high concurrency workloads. These variables are discussed in the sections that follow.

Before changing any of these values, ensure that you understand each of these variables and have evaluated their impact on your workload.

All of the workload management engine variables are sync variables. To configure these variables, first read about [Sync Variables](https://docs.singlestore.com/db/v9.1/reference/configuration-reference/engine-variables/sync-variables.md) to learn how to set them. Then, refer to [Workload Management Sync Variables](https://docs.singlestore.com/db/v9.1/reference/configuration-reference/engine-variables/list-of-engine-variables/#sync-variables-lists.md) for information on specific variables.

## Workload Management Queue Enhancements

Incoming queries are filtered into queues based on their size (large, medium, or small). If a large number of queries are all filtered into the same queue, that queue could become oversaturated and this could lead to slower processing. SingleStore has two engine variables `workload_management_queue_size_allow_upgrade` (specifies how many large queries can be waiting for execution in the LARGE queue when we want to move to that queue a MEDIUM query from the MEDIUM queue) and `workload_management_dynamic_resource_allocation` (enables/disables the dynamic reuse of WM queues feature) for dynamic query handling. These variables work together to dynamically move a query from a saturated queue to another queue if there are available resources. For example, if the medium queue is saturated and the large queue is not, the medium query will be moved into the large queue. To utilize dynamic query handling, set `workload_management_dynamic_resource_allocation variable` to ON.

Detailed information about these engine variables may be obtained in the [Workload Management Sync Variables](https://docs.singlestore.com/db/v9.1/reference/configuration-reference/engine-variables/list-of-engine-variables/#sync-variables-lists-2.md) section of the engine variable list.

## Query Classification

The workload manager classifies queries by their size: small, medium, or large. Queries are classified to determine their [Queueing Policy](https://docs.singlestore.com/db/v9.1/user-and-cluster-administration/use-the-workload-manager-and-set-resource-limits/use-the-workload-manager.md) described below.

## Small Queries

Small queries do not exceed the `workload_management_memory_queue_threshold`, which is a small amount of memory relative the total amount of leaf memory available to queries.

Small queries also do not use distributed joins. For this reason, they do not require threads or connections on the leaves.

## Medium Queries

Medium queries are queries whose estimated resource usage per leaf (threads, connections, and memory) falls between a lower and an upper bound that Workload Management (WM) maintains internally:

* Lower bound (small vs. medium)

  * Threads per leaf ≥ 1
  * Connections per leaf ≥ 1
  * Memory per leaf ≥ `total_leaf_query_memory` \* `workload_management_memory_queue_threshold`
* Queries below lower bound limits are treated as small and are never queued by WM.
* Upper bound (medium vs. large)

  * Threads per leaf ≤ 0.5 \* `workload_management_max_threads_per_leaf `/ `workload_management_expected_aggregators`
  * Connections per leaf ≤ 0.5 \* `workload_management_max_connections_per_leaf` / `workload_management_expected_aggregators`
  * Memory per leaf ≤ `total_leaf_query_memory` / `workload_management_expected_aggregators`

Queries whose estimated usage is between these lower and upper bounds are classified as medium. Queries above the upper bound are classified as large and use the global queue.

Thresholds are internal estimates per leaf, not numbers a user manually plugs in.

Workload Management computes “threads used per leaf” and “connections used per leaf” internally from the query plan and runtime settings (for example `workload_management_max_threads_per_leaf`, flexible parallelism, join shape, etc.). It then uses them internally to decide whether a query is classified as small, medium, or large and if it must be queued.

Hence in practice:

* You should treat the thresholds conceptually (small / medium / large) rather than trying to hand-calculate the exact threads-per-leaf for every statement.
* To understand why queries are being queued, you use:

  * `SHOW WORKLOAD MANAGEMENT STATUS` command or  `MV_WORKLOAD_MANAGEMENT_STATUS` view to see aggregate running and queued queries and global thresholds. This gives an idea which queue is hot and which resource is tight.
  * `WORKLOAD_MANAGEMENT_QUEUE` information\_schema view to see which queries are queued and whether they are medium or large (`QUERY_TYPE`) and which resource (threads, connections, memory) is limiting them.

## Large queries

Queries that exceed the upper threshold for medium queries are classified as large queries.

## Example: Checking Why Queries are Queued

1. Check overall WM status
   ```sql
   SHOW WORKLOAD MANAGEMENT STATUS;
   ```
   Key details to observe:

   * **Running Threads Per Leaf** (from local queue) – total threads per leaf currently used by medium queries.
   * **Running Connections Per Leaf** (from local queue) – total connections per leaf for medium queries.
   * **Memory Threshold (MB) to Queue Locally** – memory level above which medium queries start queuing locally.
   * **Memory / Connections / Threads Threshold to Queue Globally** – limits above which queries are considered large and use the global queue.
   * **Queued Queries (from local queue) vs Queued Queries (from global queue)** – how many medium vs large queries are waiting.

   This helps find out whether you are hitting medium limits (local queue) or large limits (global queue).

2. Find out which specific queries are queued and why:
   ```sql
   SELECT QUERY, QUERY_TYPE, THREADS_PER_LEAF, THREADS_PER_LEAF_LIMIT, CONNECTIONS_PER_LEAF, CONNECTIONS_PER_LEAF_LIMIT, LEAF_MEMORY, LEAF_MEMORY_LIMIT
     FROM information_schema.WORKLOAD_MANAGEMENT_QUEUE;
   ```
   Key details to observe:

   * `QUERY_TYPE` is Medium or Large as per WM’s classification.
   * For any queued row, at least one `*_USED` (internal) is close to or above its corresponding `*_LIMIT`:

     * If `THREADS_PER_LEAF` is close to `THREADS_PER_LEAF_LIMIT`, threads are the bottleneck.
     * If `CONNECTIONS_PER_LEAF` is close to `CONNECTIONS_PER_LEAF_LIMIT`, connections are limiting.
     * If `LEAF_MEMORY` is close to `LEAF_MEMORY_LIMIT`, memory is limiting.

   You can then correlate this with the PROCESSLIST view, if needed:
   ```
   SELECT ID, USER, HOST, DB, INFO
     FROM information_schema.PROCESSLIST;
   ```
   and match the text of `INFO` to the `QUERY` column from `WORKLOAD_MANAGEMENT_QUEUE`.

## Queueing Policy

Queries are queued based on their size, as determined by the thresholds in the previous section, [Query Classification](https://docs.singlestore.com/db/v9.1/user-and-cluster-administration/use-the-workload-manager-and-set-resource-limits/use-the-workload-manager.md).

## Small Queries

Small queries are never queued. They use a negligible amount of leaf resources, which are not tracked.

## Medium Queries

At regular intervals, the master aggregator collects statistics from the child aggregators. These statistics include the maximum resources used (threads per leaf, connections per leaf, and memory per leaf) during the interval and a rolling average of the maximum resources required during the previous intervals.

At the end of each interval, the master aggregator uses the statistics gathered from the child aggregators to reallocate resources to each aggregator, if necessary.

If the resource requirements for a new medium query exceed the total amount of resources allocated to all aggregators for all medium queries, the new query is queued in a local queue that is maintained by an individual aggregator. Each aggregator decides when to dequeue the queries in its local queue.

Fifty percent of the total leaf resources are reserved for medium queries.

## Large Queries

Large queries are ones whose estimated resource needs (threads, connections, or memory per leaf) exceed the upper thresholds for medium queries.Up to four large queries in the cluster will run concurrently. Any additional queries are queued in the global queue.

Fifty percent of the total leaf resources are reserved for large queries.

## Local Queue vs Global Queue

Workload Management uses two queues to control distributed queries.

The local queue  holds medium queries.

The global queue is for large queries. Global queues ensure that a few heavy, fully distributed queries cannot monopolize leaf resources and starve the rest of the workload.&#x20;

## Dynamic Reuse of WM Queues

There are two queues created by Workload Manager: one for Medium queries and one for Large queries. WM dynamically moves queries from the Medium to the Large Queue when the Medium Queue is saturated and the Large Queue is not.

There are two global variables to control this behavior: `workload_management_dynamic_resource_allocation` and `workload_management_queue_size_allow_upgrade`

## Errors and Warnings

Queries return configurable errors if too many queries are queued or a query has been queued for too long. Queries also return configurable warnings when they were queued for a significant time relative to their actual execution time. This allows users to identify when their cluster resources are insufficient to meet the load of incoming queries.

These errors and warnings are:

`ER_TOO_MANY_QUEUED_QUERIES`: When the `workload_management_max_queue_depth` value is reached, SingleStore will return this error instead of adding a new query to the queue. If you encounter this error, the maximum queue depth may need to be increased to accommodate the load. This error may also indicate that your cluster’s resources need to be increased or the load of incoming queries needs to be reduced.

`ER_QUERY_QUEUE_TIMEOUT`: When a query has been queued longer than the `workload_management_queue_timeout` value, it will return this error and will be removed from the queue. This error indicates that your cluster’s resources or the load of incoming queries need to be adjusted to successfully process incoming queries.

`ER_QUERY_QUEUED_WARNING`: When the ratio of time spent by a query in the queue versus the actual execution time of the query exceeds the `workload_management_queue_time_warning_ratio` value, the query will return this warning. The warning indicates the amount of time spent queued and the amount of time spent executing, to help you understand the sources of query latency. Encountering this warning is normal in many workloads, but if query latency is too high, your cluster’s resources may not be sufficient to process incoming queries in a timely manner.

## Observe the Status of Workload Management

To see the current state of the workload management system, including the number of running and queued queries, and resource usage and thresholds, you can run the `SHOW WORKLOAD MANAGEMENT STATUS` command. You can also run the query `SELECT * FROM information_schema.MV_WORKLOAD_MANAGEMENT_STATUS;`, which will return the same information as the previous command, but per aggregator.

The following example shows sample output from a small, lightly-loaded system. The units for the memory thresholds are in megabytes.

```sql
SHOW WORKLOAD MANAGEMENT STATUS;

```

```output

+-------------------------------------------------------+-------+
| Stat                                                  | Value |
+-------------------------------------------------------+-------+
| Queued Queries (from global queue)                    |     0 |
| Queued Queries (from local queue)                     |     0 |
| Total Queued Queries Since Startup                    |     0 |
| Running Queries (from global queue)                   |     0 |
| Running Queries (from local queue)                    |     0 |
| Running Memory (MB) On Leaves (from global queue)     |     0 |
| Running Memory (MB) On Leaves (from local queue)      |     0 |
| Allocated Memory (MB) On Leaves (for local queue)     |  6673 |
| Required Memory (MB) On Leaves (from local queue)     |     0 |
| Avg Required Memory (MB) On Leaves (from local queue) |     0 |
| Running Threads Per Leaf (from local queue)           |     0 |
| Allocated Threads Per Leaf (for local queue)          |  2048 |
| Required Threads Per Leaf (from local queue)          |     0 |
| Avg Required Threads Per Leaf (from local queue)      |     0 |
| Running Connections Per Leaf (from local queue)       |     0 |
| Allocated Connections Per Leaf (for local queue)      |  2500 |
| Required Connections Per Leaf (from local queue)      |     0 |
| Avg Required Connections Per Leaf (from local queue)  |     0 |
| Memory Threshold (MB) to Queue Locally                |    66 |
| Memory Threshold (MB) to Queue Globally               |  3337 |
| Connections Threshold to Queue Globally               |  2500 |
| Threads Threshold to Queue Globally                   |  2048 |
+-------------------------------------------------------+-------+

```

The following table provides a general description of some of the status fields. For information on the local queue and the global queue, see the [Query Classification](https://docs.singlestore.com/db/v9.1/user-and-cluster-administration/use-the-workload-manager-and-set-resource-limits/use-the-workload-manager.md) section.

| Status                                  | Description                                                                                                                                                                           |
| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Queued Queries                          | Total number of queries queued due to workload management on this aggregator                                                                                                          |
| Running Queries                         | Total number of queries in a queue, based on the number and resource intensity of other queries on this aggregator                                                                    |
| Running Memory On Leaves                | Total estimated memory used per leaf from queries that are above/below“Memory Threshold to Queue Locally”/“Memory Threshold to Queue Globally”, which are run through this aggregator |
| Running Threads Per Leaf                | Total number of threads used per leaf from queries run through this aggregator                                                                                                        |
| Running Connections Per Leaf            | Total number of connections established per leaf from queries, which are run through this aggregator                                                                                  |
| Memory Threshold to Queue Locally       | Estimated memory that a query needs to use before it is queued                                                                                                                        |
| Memory Threshold to Queue Globally      | Estimated memory that a query needs to use before it is queued and wait for multi-aggregator coordination to run                                                                      |
| Connections Threshold to Queue Globally | Total number of connections that a query takes before it is queued and wait for multi-aggregator coordination to run                                                                  |
| Threads Threshold to Queue Globally     | Total number of threads that a query takes before it is queued and wait for multi-aggregator coordination to run                                                                      |

In addition, the `SHOW PROCESSLIST` command can be executed on any SingleStore node to obtain details about currently running queries. Also, `information_schema.processlist` or `information_schema.mv_processlist` can be queried to provide the same information but can be filtered for more focused results.

The processlist provides information such as the user, host, database, command type, execution time, state, and additional information about each process. The information provided can help identify long-running or hanging queries, monitor query execution, and analyze query behavior.

If you haven't received any of the errors listed above and you still have available memory or CPU, see the [Troubleshooting Poorly Performing Queries](https://docs.singlestore.com/db/v9.1/query-data/query-tuning/troubleshooting-poorly-performing-queries.md) page for potential steps to take to resolve the issue.

***

Modified at: April 16, 2026

Source: [/db/v9.1/user-and-cluster-administration/use-the-workload-manager-and-set-resource-limits/use-the-workload-manager/](https://docs.singlestore.com/db/v9.1/user-and-cluster-administration/use-the-workload-manager-and-set-resource-limits/use-the-workload-manager/)

(An index of the documentation is available at /llms.txt)