Historical Monitoring

Cluster View

Chart Name	What it shows	When to use it
CPU Utilization	The percentage of the host’s CPU that is being used: max single-core load: The maximum CPU load across all of the available CPU cores avg core load: The average CPU load across all of the available CPU cores min single-core load: The minimum CPU load across all of the available CPU cores	To understand CPU usage and host resource usage in general, or for a given workload.
Memory Utilization	The percent of the host’s memory that is being used	To understand host memory usage for a given workload over time.
Local Disk Utilization	The local disk utilization for the workspace Total storage can be managed by dropping obsolete tables and/or purging older data in large tables.	To identify the amount of warm data so preventive actions can be taken to better handle the load, thereby ensuring stability and optimum performance.
Read/Write Queries per Second	The number of reads/writes per second of the queries running on the system	To understand typical (“normal”) cluster activity to benchmark workloads and their query rate and identify anomalies in the read/write workload. If the number of rows read or written is very high or uneven, it could indicate that some queries or operations are taking longer to process than others. This can be due to poor indexing, inefficient queries, or database design issues.
Failed Read/Write Queries per Second	The number of reads/writes failed per second of the queries running on the system
Rows Read or Written	The number of rows read/written
Execution Time per Read/Write Query	The elapsed time of read/write query	To identify changes in the pattern of execution time per read/write query from the historical norm. This may indicate an issue with an application or changes in your workload.
Threads - Connected	The number of open connections (`threads_connected`) to the database relative to the maximum limit (`max_connections`)	To identify if the database is approaching the maximum allowed connections, which is indicated by a utilization near 100%. This can potentially lead to performance issues, as queries may need to wait in a queue until threads become available to process them.
Threads - Running	The number of threads that are actively running queries (`threads_running`) relative to the maximum limit (`max_connection_threads`)	To identify if the system is approaching its capacity with regard to the number of queries that can be executed in parallel, which is indicated by a utilization near 100%. This can potentially lead to resource pressure, system unresponsiveness, latency spikes, and eventual failures.

Historical Workload Monitoring

Chart Name	What it shows	When to use it
Elapsed Time per Execution by Database	The elapsed time per query, grouped by database	To identify which databases incur the most long-running queries and observe changes in the pattern of execution time per query from the historical norm. This may indicate an issue with an application or changes in your workload.
Execution Count	The number of queries executed in a given time	To perform capacity planning for workloads and identify if workloads in general, or workload spikes in particular, are putting the workspace at risk of running out of memory.
CPU Time per Execution by Database	The CPU time spent per query activity, grouped by database	To identify which databases incur the most CPU usage. Note: A blank database indicates system activity that is not related to a user database.
Memory Usage per Execution by Database	The memory bytes spent per query activity, grouped by database	To identify which databases incur the most memory usage. Note: A blank database indicates system activity that is not related to a user database.
Disk Bytes per Execution by Database	The disk bytes spent per query activity, grouped by database	To identify which databases incur the most disk bytes. Note: A blank database indicates system activity that is not related to a user database.
Network Bytes per Execution by Database	The network bytes spent per query activity, grouped by database	To identify which databases incur the most network bytes. Note: A blank database indicates system activity that is not related to a user database.
Metrics by Query Plan	The queries executed and their relative resource consumption	To identify which queries are expensive, including how long queries are taking to complete, their CPU times, failure rates etc.

Memory Monitoring

Chart Name	What it shows	When to use it
Memory Utilization	The percentage of a host’s memory that is being used	To understand host memory usage for a given workload over time.
Memory Usage Breakdown	The memory in use categorized by Data, Query, Reserved & Other Internal Memory allocators compared to the total memory available	To perform capacity planning for memory and identify if the cluster is not performing optimally due to workloads in general, write workloads, or workload spikes in particular, and to discover where memory is allocated (table, query, etc.).
Memory Used - Data	The data memory in use	To perform capacity planning for data memory and identify if given write workloads are putting the cluster at risk of running out of memory.
Memory Used - Query	The query memory in use	To perform capacity planning for workloads and identify if workloads in general, or workload spikes in particular, are putting the cluster at risk of running out of memory.
Memory Used - Other	The memory used by SingleStore’s memory allocators	To identify why memory allocations have increased, or are anomalously large, when there are no other indicators of increased memory use, such as workload or data, and to discover where memory is allocated (table, query, etc.).

Cache Monitoring

Chart Name	What it shows	When to use it
Persistent Cache Utilization	Persistent cache utilization for the workspace. Total storage can be managed by dropping obsolete tables and/or purging older data in large tables.	To identify the quantity of warm data so preventive action can be taken to better handle the load, thereby ensuring stability and optimum performance.
Distribution of Components Using Cache	Distribution of cache utilization by data, plancache, auditlogs, and tracelogs	To understand how the cache is being utilized. Analyzing cache usage can reveal if certain artifacts (such as data, plancache, audit logs, or trace logs) are consuming an excessive amount of space. This can either cause performance issues, or require additional resources to maintain optimal operation. Monitoring cache usage and activity can also help identify performance bottlenecks, which may require cache policies to be adjusted, additional resources to be allocated, and/or your workload to be optimized.
Breakdown of Cache Utilization by Data	Cache consumption breakdown by "Data" category. Adding utilization across blobs, transaction logs, snapshots, temp blobs, etc. will be equal to the total cache utilized by "Data."
Distribution of Databases Using Cache	Distribution of cache utilization by databases. Adding utilization across databases will be equal to the total cache utilized by "Data."
Blob Cache Downloaded per Second (by Database)	Rate at which the blob cache is downloading files from remote storage.	To understand how SingleStore Helios blob cache is performing. By understanding and monitoring the rate at which the blob cache is downloading files from remote storage, potential performance bottlenecks and/or issues related to blob cache activity can be identified. For example, if high download rates are observed relative to the size of your database and scale of your hardware, you may consider increasing the local cache size. Regularly reviewing this metric can help you make well-informed decisions for optimizing the performance of SingleStore Helios.
Blob Cache Evicted per Second (by Database)	Rate at which the blob cache is evicting files.	To understand how SingleStore Helios blob cache is performing. By understanding and monitoring the rate at which the blob cache is evicting files, system resource utilization can be optimized based on your data management needs. A high eviction rate may indicate that the cache size is insufficient, or that your workload is imposing a high cache turnover. To improve overall cluster performance, reviewing data access patterns and adjusting the cache size is recommended. Regularly reviewing this metric can help you identify potential performance bottlenecks and make well-informed decisions for optimizing the performance of SingleStore Helios.

Pipeline Dashboards

Pipeline Summary

Chart Name	What it shows	When to use it
State Distribution	A high-level overview of all pipelines, including the number of pipelines in running, stopped, and error states, and the percentage of each	To identify potential issues by comparing the number of running pipelines to those that have either stopped or produced an error.
Historical Pipeline State	The state of all pipelines over a period of time	To identify potential issues by examining how a pipeline behaves over time.
Summary	The current state of all pipelines	To identify which pipelines are currently running, stopped, or in an errored state along with their associated database.

Pipeline Performance

Chart Name	What it shows	When to use it
Execution Count	The total number of executions that have run in a pipeline (specifically, the queries that are run in the engine that ingest the data into tables)	To observe the workload from pipelines.
Avg CPU Time per Execution	The average CPU time for each execution in a pipeline	To identify which pipelines are consuming excessive CPU cycles.
Avg Elapsed Time per Execution	The average elapsed time for each execution in a pipeline	To identify which pipelines are experiencing degraded performance over time.
Avg I/O per Execution	The average disk I/O (number of bytes that SingleStore read and written to the filesystem or the in-memory transaction log) per execution in a pipeline Note that this is the average value of disk_b from the total of `run_count`, `success_count`, and `failure_count` (from the MV_ACTIVITIES_CUMULATIVE table). This is focused more on the load on the server than the data being ingested.	To identify if a pipeline is experiencing I/O-related performance issues (typically when this value is consistently high).
Avg Memory Use per Execution	The average memory usage per execution in a pipeline	To identify which pipelines are exhibiting excessive memory use.
Avg Network Bytes per Execution	The average network bytes per execution in a pipeline	To identify which pipelines are experiencing degraded performance due to network constraints.
Pipeline Errors	Which pipelines have produced an error, including the pipeline name, error ID, error code, error message, and the time the error occurred	To identify and troubleshoot pipelines that have produced an error.

Pipeline Insights

Chart Name	What it shows	When to use it
Rows Streamed	The number of rows streamed per pipeline over time, letting users track ingest throughput and data movement trends.	To monitor pipeline health, troubleshoot ingest issues, tune performance, audit or plan for capacity, and compare pipeline workloads.
Data Streamed (in MBs)	The amount of data, in megabytes, streamed per pipeline over time, which allows users to monitor ingest volume and usage trends.	To assess pipeline ingest volume, detect anomalies or spikes, tune performance for large data loads, audit or plan capacity, and compare data movement across pipelines.
# of Batches	The number of pipeline batches processed over time, which allows users to track ingest job frequency and execution patterns.	To monitor and troubleshoot pipeline activity, analyze batch processing rates, audit job execution, and compare workload patterns across pipelines.
Time Spent per Batch	The average time taken to process each pipeline batch, which lets users track ingest speed and identify slow-running jobs.	To troubleshoot pipeline performance issues, spot latency bottlenecks, tune execution efficiency, and compare batch processing times across pipelines.
Extractor Wait Time per Batch	The average time each pipeline batch spends waiting for extractor resources, which allows users to monitor resource contention and ingest delays.	To identify bottlenecks caused by extractor unavailability, troubleshoot pipeline latency, optimize resource allocation, and compare wait times across pipelines.
Kafka Offsets Pending Ingest	The number of unread Kafka offsets waiting to be ingested per pipeline, which indicates the real-time backlog in streaming data pipelines.	To monitor ingest lag, detect pipeline bottlenecks, troubleshoot delays between source and target, and compare processing efficiency across Kafka-connected pipelines.

Refer to Troubleshoot Pipeline Performance and Memory Usage for more information on tracking pipeline resource usage through SQL queries using the pipeline's activity_tracking_id .

Query History

Chart Name	What it shows	When to use it
Query History	Query runtimes, those queries that have succeeded, and those queries that have failed.	To view query runtimes over time, identify and resolve slow-running and failed queries, and view and optimize workloads in real time. Refer to Query History for additional information and examples.

Chart Name

What it shows

When to use it

Query History

Query runtimes, those queries that have succeeded, and those queries that have failed.

To view query runtimes over time, identify and resolve slow-running and failed queries, and view and optimize workloads in real time.

Refer to Query History for additional information and examples.

Resource Pool Monitoring

Note

This is a Preview feature.

Please contact SingleStore Support to enable this feature.

Chart Name	What it shows	When to use it
Finished Queries	The number of queries finished on a given resource pool	To perform capacity planning for workloads by resource pool and identify if workloads in general, or workload spikes in particular, are queueing the queries by current resource pool configurations.
Killed Queries	The number of queries killed on a given resource pool	To understand how many queries are killed on a given resource pool.
Queueing Queries	The number of queries queued for a given resource pool	To understand how many queries are queued over time and perform capacity planning to increase resource limits for a given pool as needed.
Queue Time per Queued Query	The average queue time per query for a given resource pool	To understand how long queries are being queued; helps to perform capacity planning to increase resource limits for a given pool as needed.

On this page

View the Dashboards

Cluster View

Historical Workload Monitoring

Memory Monitoring

Cache Monitoring

Pipeline Dashboards

Pipeline Summary

Pipeline Performance

Pipeline Insights

Query History

Resource Pool Monitoring

Was this article helpful?

On this page

Was this article helpful?