Use the Workload Manager
On this page
SingleStore’s Workload Manager (WM) automatically manages cluster workloads by queuing and executing distributed queries based on available leaf connections, threads, and memory, so that queries only run when these resources are sufficiently available.
WM is concerned with:
-
number of threads on leaves specified by the
workload_variable.management_ max_ threads_ per_ leaf -
number of connections on each leaf specified by the
workload_variable.management_ max_ connections_ per_ leaf -
leaf memory.
Definition of the Variables Used by WM
workload_
The maximum number of threads to use per leaf.max_ engine variable, and they should generally be set to the same value.
workload_
The maximum number of connections to use per leaf node in the cluster.
max_
The maximum number of kernel threads for processing queries.
WM divides leaf resources among multiple aggregators.workload_ variable is set to 1000, implying the total number of threads on the leaf node at any point of time is limited to 1K.
To see these quotas, which change with time, on the specific node run:
SHOW WORKLOAD MANAGEMENT STATUS;In addition to splitting leaf resources among aggregators, WM also throttles the number of queries that can run on any aggregator.
Differences between Workload Manager and Resource Governance
-
Related to workload management is the concept of resource governance.
Unlike the Workload Manager which is built-in and enabled by default, you have to enable the resource governance. For more information, see Set Resource Limits. -
In SingleStore’s query pipeline, if resource governance is enabled then resource governance given priority.
WM queues queries after resource governance. -
Resource governance and workload management are two mutually exclusive features.
The Workload Manager does not have access to the resource pool configurations and the memory limits imposed by them. -
The Workload Manager classifies a query based on estimated usage (connections, threads, memory) and determines if it can run or should be queued.
-
The Resource Pool checks
MAX_,CONCURRENCY MAX_to decide whether a query can run or should be queued.QUEUE_ DEPTH -
Once a query starts executing, the Resource Pool enforces limits such as
MEMORY_,PERCENTAGE CPU_,PERCENTAGE QUERY_, andMEMORY_ PERCENTAGE QUERY_to control actual resource usage and prevent overload.TIMEOUT
Query Lifecycle Flowchart
This is a flowchart of how a query lifecycle is spent when it interacts with the aggregators, passes through the resource pool, and then comes under WM.
Configure Workload Management
The workload manager can be configured using engine variables, and the default settings are sufficient for most users regardless of cluster size.
Before changing any of these values, ensure that you understand each of these variables and have evaluated their impact on your workload.
All of the workload management engine variables are sync variables.
Query Classification
The workload manager classifies queries by their size: small, medium, or large.
Small Queries
Small queries do not exceed the workload_, which is a small amount of memory relative the total amount of leaf memory available to queries.
Small queries also do not use distributed joins.
Medium Queries
Medium queries are queries whose estimated resource usage per leaf (threads, connections, and memory) falls between a lower and an upper bound that Workload Management (WM) maintains internally:
-
Lower bound (small vs.
medium) -
Threads per leaf ≥ 1
-
Connections per leaf ≥ 1
-
Memory per leaf ≥
total_*leaf_ query_ memory workload_management_ memory_ queue_ threshold
-
-
Queries below lower bound limits are treated as small and are never queued by WM.
-
Upper bound (medium vs.
large) -
Threads per leaf ≤ 0.
5 * workload_/management_ max_ threads_ per_ leaf workload_management_ expected_ aggregators -
Connections per leaf ≤ 0.
5 * workload_/management_ max_ connections_ per_ leaf workload_management_ expected_ aggregators -
Memory per leaf ≤
total_/leaf_ query_ memory workload_management_ expected_ aggregators
-
Queries whose estimated usage is between these lower and upper bounds are classified as medium.
Thresholds are internal estimates per leaf, not numbers a user manually plugs in.
Workload Management computes “threads used per leaf” and “connections used per leaf” internally from the query plan and runtime settings (for example workload_, flexible parallelism, join shape, etc.
Hence in practice:
-
You should treat the thresholds conceptually (small / medium / large) rather than trying to hand-calculate the exact threads-per-leaf for every statement.
-
To understand why queries are being queued, you use:
-
SHOW WORKLOAD MANAGEMENT STATUScommand orMV_view to see aggregate running and queued queries and global thresholds.WORKLOAD_ MANAGEMENT_ STATUS This gives an idea which queue is hot and which resource is tight. -
WORKLOAD_information_MANAGEMENT_ QUEUE schema view to see which queries are queued and whether they are medium or large ( QUERY_) and which resource (threads, connections, memory) is limiting them.TYPE
-
Large queries
Queries that exceed the upper threshold for medium queries are classified as large queries.
Example: Checking Why Queries are Queued
-
Check overall WM status
SHOW WORKLOAD MANAGEMENT STATUS;Key details to observe:
-
Running Threads Per Leaf (from local queue) – total threads per leaf currently used by medium queries.
-
Running Connections Per Leaf (from local queue) – total connections per leaf for medium queries.
-
Memory Threshold (MB) to Queue Locally – memory level above which medium queries start queuing locally.
-
Memory / Connections / Threads Threshold to Queue Globally – limits above which queries are considered large and use the global queue.
-
Queued Queries (from local queue) vs Queued Queries (from global queue) – how many medium vs large queries are waiting.
This helps find out whether you are hitting medium limits (local queue) or large limits (global queue).
-
-
Find out which specific queries are queued and why:
SELECT QUERY, QUERY_TYPE, THREADS_PER_LEAF, THREADS_PER_LEAF_LIMIT, CONNECTIONS_PER_LEAF, CONNECTIONS_PER_LEAF_LIMIT, LEAF_MEMORY, LEAF_MEMORY_LIMITFROM information_schema.WORKLOAD_MANAGEMENT_QUEUE;Key details to observe:
-
QUERY_is Medium or Large as per WM’s classification.TYPE -
For any queued row, at least one
*_(internal) is close to or above its correspondingUSED *_:LIMIT -
If
THREADS_is close toPER_ LEAF THREADS_, threads are the bottleneck.PER_ LEAF_ LIMIT -
If
CONNECTIONS_is close toPER_ LEAF CONNECTIONS_, connections are limiting.PER_ LEAF_ LIMIT -
If
LEAF_is close toMEMORY LEAF_, memory is limiting.MEMORY_ LIMIT
-
You can then correlate this with the PROCESSLIST view, if needed:
SELECT ID, USER, HOST, DB, INFO FROM information_schema.PROCESSLIST;and match the text of
INFOto theQUERYcolumn fromWORKLOAD_.MANAGEMENT_ QUEUE -
Queueing Policy
Queries are queued based on their size, as determined by the thresholds in the previous section, Query Classification.
Small Queries
Small queries are never queued.
Medium Queries
At regular intervals, the master aggregator collects statistics from the child aggregators.
At the end of each interval, the master aggregator uses the statistics gathered from the child aggregators to reallocate resources to each aggregator, if necessary.
If the resource requirements for a new medium query exceed the total amount of resources allocated to all aggregators for all medium queries, the new query is queued in a local queue that is maintained by an individual aggregator.
Fifty percent of the total leaf resources are reserved for medium queries.
Large Queries
Large queries are ones whose estimated resource needs (threads, connections, or memory per leaf) exceed the upper thresholds for medium queries.
Fifty percent of the total leaf resources are reserved for large queries.
Local Queue vs Global Queue
Workload Management uses two queues to control distributed queries.
The local queue holds medium queries.
The global queue is for large queries.
Errors and Warnings
Queries return configurable errors if too many queries are queued or a query has been queued for too long.
These errors and warnings are:
ER_: When the workload_ value is reached, SingleStore will return this error instead of adding a new query to the queue.
ER_: When a query has been queued longer than the workload_ value, it will return this error and will be removed from the queue.
ER_: When the ratio of time spent by a query in the queue versus the actual execution time of the query exceeds the workload_ value, the query will return this warning.
Observe the Status of Workload Management
To see the current state of the workload management system, including the number of running and queued queries, and resource usage and thresholds, you can run the SHOW WORKLOAD MANAGEMENT STATUS command.SELECT * FROM information_, which will return the same information as the previous command, but per aggregator.
The following example shows sample output from a small, lightly-loaded system.
SHOW WORKLOAD MANAGEMENT STATUS;
+-------------------------------------------------------+-------+
| Stat | Value |
+-------------------------------------------------------+-------+
| Queued Queries (from global queue) | 0 |
| Queued Queries (from local queue) | 0 |
| Total Queued Queries Since Startup | 0 |
| Running Queries (from global queue) | 0 |
| Running Queries (from local queue) | 0 |
| Running Memory (MB) On Leaves (from global queue) | 0 |
| Running Memory (MB) On Leaves (from local queue) | 0 |
| Allocated Memory (MB) On Leaves (for local queue) | 6673 |
| Required Memory (MB) On Leaves (from local queue) | 0 |
| Avg Required Memory (MB) On Leaves (from local queue) | 0 |
| Running Threads Per Leaf (from local queue) | 0 |
| Allocated Threads Per Leaf (for local queue) | 2048 |
| Required Threads Per Leaf (from local queue) | 0 |
| Avg Required Threads Per Leaf (from local queue) | 0 |
| Running Connections Per Leaf (from local queue) | 0 |
| Allocated Connections Per Leaf (for local queue) | 2500 |
| Required Connections Per Leaf (from local queue) | 0 |
| Avg Required Connections Per Leaf (from local queue) | 0 |
| Memory Threshold (MB) to Queue Locally | 66 |
| Memory Threshold (MB) to Queue Globally | 3337 |
| Connections Threshold to Queue Globally | 2500 |
| Threads Threshold to Queue Globally | 2048 |
+-------------------------------------------------------+-------+The following table provides a general description of some of the status fields.
|
Status |
Description |
|---|---|
|
Queued Queries |
Total number of queries queued due to workload management on this aggregator |
|
Running Queries |
Total number of queries in a queue, based on the number and resource intensity of other queries on this aggregator |
|
Running Memory On Leaves |
Total estimated memory used per leaf from queries that are above/below |
|
Running Threads Per Leaf |
Total number of threads used per leaf from queries run through this aggregator |
|
Running Connections Per Leaf |
Total number of connections established per leaf from queries, which are run through this aggregator |
|
Memory Threshold to Queue Locally |
Estimated memory that a query needs to use before it is queued |
|
Memory Threshold to Queue Globally |
Estimated memory that a query needs to use before it is queued and wait for multi-aggregator coordination to run |
|
Connections Threshold to Queue Globally |
Total number of connections that a query takes before it is queued and wait for multi-aggregator coordination to run |
|
Threads Threshold to Queue Globally |
Total number of threads that a query takes before it is queued and wait for multi-aggregator coordination to run |
In addition, the SHOW PROCESSLIST command can be executed on any SingleStore node to obtain details about currently running queries.information_ or information_ can be queried to provide the same information but can be filtered for more focused results.
The processlist provides information such as the user, host, database, command type, execution time, state, and additional information about each process.
If you haven't received any of the errors listed above and you still have available memory or CPU, see the Troubleshooting Poorly Performing Queries page for potential steps to take to resolve the issue.
Last modified: