check

Description

Checks a report generated by sdb-report collect for issues.

Available checkers

+-----------------------------------+----------+----------------------------------------------------------------------------------+
| ID | EXCLUDED | DESCRIPTION |
+-----------------------------------+----------+----------------------------------------------------------------------------------+
| attachRebalanceDelay | | This variable should be set to 120 (default). If it is set to another value, the |
| | | cluster may experience delays in self-healing operations |
| autoAttach | | This variable should be set to "ON" (default). "OFF" value is preventing the |
| | | nodes from reattaching after restart |
| blockedQueries | | Blocked queries may lead to additional failed operations. SingleStore recommends that you |
| | | reduce your workload or kill running queries |
| cgroupDisabled | | Linux memory subsystems use a number of bytes of memory per physical page on |
| | | x86_64 systems. These resources are consumed even when memory is not used in |
| | | any hierarchy. As SingleStore doesn't use the memory subsystem, SingleStore recommends |
| | | disabling this as it will reduce the resource consumption of the kernel |
| cgroupMemoryUsage | | Node processes may be run within a cgroup with memory limits; exceeding those |
| | | limits may lead to decreased performance and/or node failure |
| chronydDisabled | | SingleStore recommends that chronyd is disabled so that ntpd can be used for time |
| | | synchronization. Contact your administrator to disable chronyd |
| clusterMemoryUsage | | As SingleStore is allocated the value specified in maximum_memory, query |
| | | failures may result if memory usage approaches this limit. To alleviate this |
| | | condition (for the short term), increase maximum_memory or to delete data which |
| | | is being stored in memory to allow more headroom |
| collectionErrors | | Collection errors in the report typically indicate that all parts of the report |
| | | could not be gathered. This could mean that some information may be missing |
| | | and a thorough check could not be performed, or that Toolbox cannot access the |
| | | required information |
| columnstoreSegmentRows | | Inconsistent columnstore segment rows can lead to non-optimal query performance |
| | | or other issues. Columnstore segment rows refers to the number of rows |
| | | SingleStore holds in each segment. The default value is 1024000. Refer |
| | | to https://docs.singlestore.com/docs/managing-columnstore-segments/ for more |
| | | information |
| consistentMaxMemory | | Inconsistent maximum memory settings will lead to some nodes having more or less |
| | | memory available for operations and can cause performance inconsistencies across |
| | | the cluster. SingleStore recommends that all max_memory settings are consistent. Refer to |
| | | https://docs.singlestore.com/docs/configure-memory-limits/ for more information |
| cpuFeatures | | SingleStore can make use of AVX2 instructions for optimal performance. Refer |
| | | to https://docs.singlestore.com/docs/instruction-set-verification/ for more |
| | | information |
| cpuFreqPolicy | EXCLUDED | Disabling power saving and Turbo Mode settings on all hosts will lead to more |
| | | consistent performance across the cluster |
| cpuHyperThreading | | A CPU with hyperthreading will ensure optimal performance. Hyperthreading allows |
| | | a CPU to split a physical core into two virtual cores, or "threads." This allows |
| | | each core to do two things simultaneously |
| cpuIdle | | In general, SingleStore recommends utilizing all of the cores available on |
| | | a host. However, if a CPU is frequently less than 5% idle, this typically |
| | | indicates that your workload will not have room to grow, and more cores are |
| | | likely required |
| cpuMemoryBandwidth | | Low CPU-memory bandwidth can highlight potential performance issues on your |
| | | hosts |
| cpuModel | | Differing CPU models may lead to inconsistent performance |
| defaultVariables | | SingleStore recommends keeping the default values for these variables for optimal cluster |
| | | operation |
| defaultWorkloadManagement | | SingleStore recommends keeping the default values for the workload management settings for |
| | | optimal cluster operation |
| defunctProcesses | | Defunct processes may be using system resources and preventing their use by |
| | | SingleStore. SingleStore recommends that you kill these processes if possible |
| delayedThreadLaunches | | Delayed thread launches may indicate that a workload is too intensive for the |
| | | available threads. SingleStore recommends decreasing the cluster's workload |
| detectCrashStackTraces | | The presence of dmp.stack files indicates that a SingleStore node has |
| | | crashed, which should be investigated |
| disconnectedReplicationSlaves | | Disconnected replication secondary may mean that you don't have full redundancy |
| | | in your system |
| diskBandwidth | | Disk bandwidth, an indicator of disk performance, is computed by examining the |
| | | total bytes transferred between the first request for service and the completion |
| | | of the transfer |
| diskInodesUsage | | Exhausting the inode capacity can lead to the inability to store and/or retrieve |
| | | data. To alleviate this potential issue, either increase the inode capacity, or |
| | | reduce the inode usage |
| diskLatencyRead | | Disk bandwidth is an important performance indicator when reading data. |
| | | SingleStore recommends investigating potential disk performance issues when the |
| | | disk's "read" latency is greater than 10 ms |
| diskLatencyWrite | | Disk bandwidth is an important performance indicator when writing data. |
| | | SingleStore recommends investigating potential disk performance issues when the |
| | | disk's "write" latency is greater than 10 ms |
| diskUsage | | Checks free disk space and identifies if you are approaching your disk capacity |
| | | limits |
| duplicatePartitionDatabase | | Duplicate partitions may cause extra memory or disk usage in your system |
| explainRebalancePartitionsChecker | | If the cluster isn't properly rebalanced (where EXPLAIN REBALANCE PARTITIONS is |
| | | not null), partitions are not distributed evenly across the cluster. An uneven |
| | | partition distribution can lead to nodes containing more data and/or performing |
| | | more work (leading to "hotspots"). To remedy, run REBALANCE PARTITIONS. Refer to |
| | | https://docs.singlestore.com/docs/rebalance-partitions/ for more information |
| failedBackgroundThreadAllocations | | Failed background thread allocations can lead to further cascading cluster |
| | | issues. SingleStore to scale back your workload when you see these |
| | | failures |
| failedCodegen | | Code generation errors indicate that your SQL was not properly compiled. We |
| | | recommend that you review and correct the query that caused the code generation |
| | | error |
| failureDetectionOn | | SingleStore nodes will not properly fail over if failure detection is set to |
| | | OFF. To ensure that SingleStore nodes will properly fail over, set failure |
| | | detection to ON |
| filesystemType | | Unsupported file systems may cause unpredictable results. Please |
| | | ensure your cluster is deployed on a supported filesystem. Refer to |
| | | https://docs.singlestore.com/docs/system-requirements/columnstore-performance/ |
| | | for more information |
| highAvailability | | High availability mode distributes leaf nodes among availability groups such |
| | | that paired leaves do not share the same host |
| installedPermissions | | Specific file ownership permissions are required to run SingleStore. This |
| | | check ensures that the permissions are set properly so that SingleStore can |
| | | operate without issue |
| interpreterMode | | SingleStore recommends setting the interpreter mode to interpret_first. When |
| | | set, SingleStore interprets and compiles a query shape in parallel |
| | | as the query is encountered rather than compiling it first. Refer to |
| | | https://docs.singlestore.com/docs/code-generation/interpreter-modes/ for more |
| | | information |
| kernelVersions | | Inconsistent kernel versions are not recommended |
| leafAverageRoundtripLatency | | If leafroundtrip latency is high, SingleStore recommends checking your network |
| | | connectivity between hosts |
| leavesNotOnline | | Offline leaf nodes may indicate a cluster issue. If high availability is not |
| | | enabled, the databases will be inaccessible |
| longRunningQueries | | Long-running queries may indicate that the cluster's workload is too high. We |
| | | recommend checking the cluster's workload for long-running queries and killing |
| | | them |
| majorPageFaults | | Memory pressure is an indicator that a hosts's memory is unable to efficiently |
| | | service processing needs. Frequent page faults on a host are a sign of memory |
| | | pressure |
| mallocActiveMemory | | Shows the memory allocated directly from the operating system and managed by |
| | | the C runtime allocators (not SingleStore’s built-in memory allocators that |
| | | use the Buffer Manager). In this case, the memory use should be approximately 1 |
| | | - 2 GBs for most workloads. If larger, SingleStore recommends investigating the system's |
| | | memory use |
| maxMapCount | | Incorrectly setting this can lead to memory errors. Refer to |
| | | https://docs.singlestore.com/memsql-report-redir/configure-linux-vm-settings for |
| | | more information |
| maxMemorySettings | | SingleStore recommends setting the maximum memory to a percentage of the host's total |
| | | memory, with a ceiling of 90% |
| maxOpenFiles | | A setting lower than the recommended setting can significantly |
| | | degrade performance and introduce connection limit errors. Refer to |
| | | https://docs.singlestore.com/memsql-report-redir/configure-linux-vm-settings for |
| | | more information |
| memoryCommitted | | Virtual memory can potentially be overallocated, and exceed a hosts's physical |
| | | memory. This can lead to a workload failures due to memory pressure |
| memsqlVersions | | SingleStore recommends that the deployed version of SingleStore is consistent across |
| | | all hosts and nodes |
| minFreeKbytes | | Setting these to the recommended values will minimize |
| | | the likelihood of memory errors on your hosts. Refer to |
| | | https://docs.singlestore.com/memsql-report-redir/configure-linux-vm-settings for |
| | | more information |
| missingClusterDb | | The cluster database holds all the metadata for your cluster. A missing cluster |
| | | database requires intermediate intervention and potentialaly a refresh of your |
| | | cluster via backup/restore |
| networkBuffersMax | | wmem_max and rmem_max are network settings that control the send and receive |
| | | socket buffer sizes, respectively. If these parameters are set too low, you may |
| | | experience latency. SingleStore recommends to set each of these values to a minimum |
| | | of 8MB |
| numaConfiguration | | When running SingleStore on hosts that support Non-Uniform |
| | | Memory Access (NUMA) sockets, SingleStore recommends configuring SingleStore |
| | | DB for NUMA via numactl for optimal performance. Refer to |
| | | https://docs.singlestore.com/studio-redir/memsql-deploy-configure-numa/ for more |
| | | information |
| offlineAggregators | | Offline aggregators must be addressed as less work will be load-balanced across |
| | | the cluster |
| orchestratorProcesses | | Orchestrator processes may cause undesired actions to be taken on SingleStore |
| | | hosts which may negatively impact the cluster |
| orphanDatabases | | Orphan databases, while unused, still consume memory. Orphan databases |
| | | can and should be cleared using CLEAR ORPHAN DATABASES. Refer to |
| | | https://docs.singlestore.com/docs/clear-orphan-databases/ for more information |
| orphanTables | | Orphan tables, while unused, still consume memory. Orphan tables |
| | | can and should be cleared using CLEAR ORPHAN DATABASES. Refer to |
| | | https://docs.singlestore.com/docs/clear-orphan-databases/ for more information |
| outOfMemory | | Out-of-memory errors may indicate memory pressure on the cluster. |
| | | SingleStore recommends identifying and reducing memory usage. Refer to |
| | | https://docs.singlestore.com/docs/identifying-reducing-memory-usage/ for more |
| | | information |
| partitionsConsistency | | SingleStore recommends that SSD partitions start at a minimum of 4096 byte-sectors. Disk |
| | | performance issues may result if this value is inconsistent across hosts, or if |
| | | the partition starts at < 4096 byte-sectors |
| pendingDatabases | | Pending databases are available for read and write queries. Databases that |
| | | remains in a "pending" state for an extended period shoud be investigated |
| queuedQueries | | A large number of queued queries may indicate a high cluster workload. We |
| | | recommend reducing the workload and/or killing long-running queries |
| readyQueueSaturated | | Ready Queue saturation indicates there aren't enough connection threads |
| | | available to handle the workload. SingleStore recommends reducing the workload and/or |
| | | killing long-running queries |
| replicationLag | | Checks if the replication on the secondary cluster is out of sync with the |
| | | primary cluster |
| replicationPausedDatabases | | Identifies if PAUSE REPLICATION has been run and provides a status |
| runningAlterOrTruncate | | A running ALTER or TRUNCATE command may explain why the cluster is experiencing |
| | | issues when attempting to run queries |
| runningBackup | | This informational check can help troubleshoot issues caused by running a backup |
| secondaryDatabases | | This informational check can help determine if the cluster is the primary |
| | | cluster, or a secondary/replicated one |
| securityLimits | | Checks that the nproc and NOFILE limits in /etc/security/limits.conf are at |
| | | least 128000 and 1024000, respectively |
| swapEnabled | | This check determines if there is adequate swap space on a host, where 10% or |
| | | more of physical memory is typically allocated for swap. Swap space will be |
| | | utilized when the host is under memory pressure |
| swapUsage | | Your host may be under memory pressure if the swap space that is actively being |
| | | used is greater than 5% |
| syncCnfVariables | | If sync variables are not set in the engine, there will be discrepancies between |
| | | what the cnf file contains and what the associated values actually are |
| tracelogHardShutdown | | Search for nodes that have sustained a hard shutdown (where a node's host has |
| | | crashed or lost power) |
| tracelogOOD | | Out of disk space |
| tracelogOOM | | Out of memory |
| transparentHugepage | | Disable transparent huge pages (THP) for optimal SingleStore performance. |
| | | Refer to https://docs.singlestore.com/memsql-report-redir/transparent-hugepage/ |
| | | for more information |
| unkillableQueries | | Indicates that there are queries running on your cluster that can't be killed. |
| | | This may be due to long-running processes that have rendered other processes |
| | | to be unkillable. SingleStore recommends identifying long-running processes using SHOW |
| | | PROCESSLIST and killing them |
| unmappedMasterPartitions | | Use ATTACH PARTITIONS to reattach disconnected partitions to the cluster. Refer |
| | | to https://docs.singlestore.com/docs/attach-partition/ for more information |
| unrecoverableDatabases | | An unrecoverable database is no longer readable or writeable |
| userDatabaseRedundancy | | The absence of redundancy indicates that not all partitions |
| | | have replicas that they can failover to. SingleStore recommends running |
| | | EXPLAIN RESTORE REDUNDANCY and restoring if possible. Refer to |
| | | https://docs.singlestore.com/docs/restore-redundancy/ for more information |
| validLicense | | A valid and properly applied license is required to comply with SingleStore |
| | | terms and conditions |
| validateSsd | | SingleStore must be deployed and run on SSDs |
| versionHashes | | Confirms that a SingleStore version is a General Availability (GA) release |
| vmOvercommit | | By design, Linux kills processes that are consuming large amounts of memory when |
| | | the amount of free memory is deemed to be too low. Overcommit settings that are |
| | | set too low may cause frequent and unnecessary failures |
| vmSwappiness | | The vm.swappiness value affects system performance as it controls when swapping |
| | | is activated, and how swap space is used. When set to lower values, the kernel |
| | | will use less swap space. When set to higher values, the kernel will use |
| | | more swap space. While the range of acceptable values is from 0 to 100, the |
| | | recommended value is from 1 to 10, and should never be set to 0. |
| whitespacesInObjectName | | It is a bad practice to create a database object with leading or trailing |
| | | whitespace because it is handled as a separate object than an identical one |
| | | without whitespace |
+-----------------------------------+----------+----------------------------------------------------------------------------------+

Examples

Run a single checker.

sdb-report check --only orchestratorProcesses

Run pre-SingleStore install environment checks only. Use this command with sdb-report collect --validate-env.

sdb-report check --validate-env

Exclude specific checkers.

sdb-report check --exclude minFreeKbytes --exclude maxOpenFiles

Usage

Usage:
sdb-report check [flags]
For flags that can accept multiple values (indicated by VALUES after the name of the flag),
separate each value with a comma.
Flags:
--exclude VALUES Exclude the specified checkers
--exclude-global Exclude global collectors from the report file checker
-h, --help Help for check
--include VALUES Include the specified checkers
--include-performance Include checkers that create load on cluster (not recommended for active clusters)
--mask-ip Mask usernames, hostnames, IP and MAC addresses in the report file checker
--only VALUES Only run the specified checkers
-i, --report-path ABSOLUTE_PATH Read the report from the specified tarball or directory. If you do not already have a report, run 'sdb-report collect' to generate one
--show-skips Display more information about skipped checks
--validate-env Run checkers that do not require SingleStore installation (performance checkers included)
Global Flags:
--backup-cache FILE_PATH File path for the backup cache
--cache-file FILE_PATH File path for the Toolbox node cache
-c, --config FILE_PATH File path for the Toolbox configuration
--disable-colors Disable color output in console, which some terminal sessions/environments may have difficulty with
--disable-spinner Disable the progress spinner, which some terminal sessions/environments may have issues with
-j, --json Enable JSON output
--parallelism POSITIVE_INTEGER Maximum number of operations to run in parallel
--runtime-dir DIRECTORY_PATH Where to store Toolbox runtime data
--ssh-control-persist SECONDS Enable SSH ControlPersist and set it to the specified duration in seconds
--ssh-max-sessions POSITIVE_INTEGER Maximum number of SSH sessions to open per host, must be at least 3
--ssh-strict-host-key-checking Enable strict host key checking for SSH connections
--ssh-user-known-hosts-file FILE_PATH Path to the user known_hosts file for SSH connections. If not set, /dev/null will be used
--state-file FILE_PATH Toolbox state file path
-v, --verbosity count Increase logging verbosity: valid values are 1, 2, 3. Usage -v=count or --verbosity=count
-y, --yes Enable non-interactive mode and assume the user would like to move forward with the proposed actions by default

Remarks

This command is interactive unless you use either --yes or --json flag to override interactive behavior.

Categorization of Checkers

The checkers used in reporting gather cluster information related to alerting, performance, and pre-installation environment validation. The following table maps checkers with the category of the information reported by them.

Category

Checkers

Alerting

clusterMemoryUsage

cpuIdle (also reports performance-related information)

diskInodesUsage

diskLatencyRead (also reports performance-related information)

diskLatencyWrite (also reports performance-related information)

diskUsage (also reports environment validation details)

explainRebalancePartitionsChecker

leavesNotOnline

majorPageFaults (also reports performance-related information)

memoryCommitted (also reports performance-related information)

offlineAggregators

orphanDatabases

pendingDatabases

secondaryDatabases

swapUsage (also reports performance-related information)

unrecoverableDatabases

userDatabaseRedundancy

Performance

cpuIdle (also reports alerting-related information)

cpuMemoryBandwidth (also reports environment validation details)

diskBandwidth (also reports environment validation details)

diskLatencyRead (also reports alerting-related information)

diskLatencyWrite (also reports alerting-related information)

majorPageFaults (also reports alerting-related information)

memoryCommitted (also reports alerting-related information)

swapUsage (also reports alerting-related information)

Pre-installation Environment Validation

cgroupDisabled

collectionErrors

cpuFeatures

cpuFreqPolicy

cpuHyperThreading

cpuMemoryBandwidth (also reports performance-related information)

cpuModel

defunctProcesses

diskBandwidth (also reports performance-related information)

diskUsage (also reports alerting-related information)

kernelVersions

maxMapCount

minFreeKbytes

networkBuffersMax

orchestratorProcesses

swapEnabled

transparentHugepage

validateSsd

vmOvercommit

Note that some checkers do not fall under any of the three categories and are used to collect more general information about the cluster.

Pre-Installation Environment Validation

Before installing SingleStore on hosts, you need to validate the deployment environment for sufficient resources and optimal configurations to ensure the best possible performance of your database. The sdb-report tool provides a series of collectors and pre-installation environment checks that can help tune your hardware to be the most compatible with SingleStore. The pre-installation checks are only run against the components that apply to hosts without the SingleStore installed on them.

If you are deploying SingleStore using SingleStore Tools, the sdb-report tool will automatically check the system and flag potential configuration changes prior to the SingleStore deployment. The pre-installation checks can also be run manually on new hosts that are added to a cluster as part of database scaling or on existing hosts in a cluster.

Perform the following steps to manually check if your environment is ready for SingleStore deployment.

  1. Install SingleStore Tools and register the hosts on which you plan to deploy SingleStore.

  2. Run the following command at the command line.

    sdb-report collect --validate-env

    This command collects a report on the pre-installation checks from all the registered hosts. You can collect a report from specific hosts by using the --host flag.

  3. After the report has been collected, run the following command at the command line. Make sure to provide the path to the report collected in the previous step.

    sdb-report check --validate-env --report-path <report-name/including/path>

    This command returns a list of pre-installation checks as pass/fail/warn metrics, and alerts on any potential configuration changes that you need to make before proceeding with the SingleStore deployment.

Below is a sample output of the pre-installation checks.

sdb-report check --validate-env --report-path report-2021-05-04T075311.tar.gz
✓ diskUsage ..................................... [PASS]
✓ collectionErrors .............................. [PASS]
✓ defunctProcesses .............................. [PASS]
✓ diskLatencyRead ............................... [PASS]
✓ swapUsage ..................................... [PASS]
✓ diskLatencyWrite .............................. [PASS]
✓ cpuFeatures ................................... [PASS]
✓ cpuIdle ....................................... [PASS]
✘ transparentHugepage ........................... [FAIL]
FAIL /sys/kernel/mm/transparent_hugepage/defrag is [madvise] on 172.0.0.1
NOTE https://docs.memsql.com/memsql-report-redir/transparent-hugepage
✓ validateSsd ................................... [PASS]
✓ memoryCommitted ............................... [PASS]
✓ networkBuffersMax ............................. [PASS]
✓ orchestratorProcesses ......................... [PASS]
✓ majorPageFaults ............................... [PASS]
✓ cpuFreqPolicy ................................. [PASS]
NOTE cpu freq info collector on 172.0.0.1 had non-empty stderr output, can be found at cpuFreqInfo/cpuFreqInfo_stderr
NOTE wasn't able to get powersave state on 172.0.0.1 due to cpufreq driver disabled for your kernel
✓ kernelVersions ................................ [PASS]
NOTE 3.16 on all
✘ diskBandwidth ................................. [WARN]
WARN disk bandwidth collection error for host 172.0.0.1: Cannot collect disk bandwidth info because stress-ng is unavailable
✓ cpuModel ...................................... [PASS]
NOTE Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz on all
✓ minFreeKbytes ................................. [PASS]
✓ vmOvercommit .................................. [PASS]
✓ swapEnabled ................................... [PASS]
✓ cpuHyperThreading ............................. [PASS]
✘ maxMapCount ................................... [FAIL]
Fail vm.max_map_count = 65530 too low on 172.0.0.1
✘ cpuMemoryBandwidth ............................ [WARN]
WARN cpu-memory bandwidth collector on 172.0.0.1 encountered error: Impossible to collect cpu-memory bandwidth info due to mlc unavailable
✓ cgroupDisabled ................................ [PASS]
some checks failed: 21 PASS, 2 WARN, 2 FAIL

The diagnostics from the sample report recommend the following actions based on the best practices for using SingleStore.

  • Increase the virtual memory (vm) setting, vm.max_map_count, to the specified value, which will decrease the risk of memory errors.

  • Disable Transparent Huge Pages to ensure that the system has consistent query performance times.

Similarly, you can gather the necessary configuration changes from the report and then tune your environment for a SingleStore deployment.

Last modified: March 8, 2024

Was this article helpful?