Configure Alerts
Configure SingleStoreDB Alerts
The sdb-report tool allows you to run a variety of recommended health checks on your cluster. As of Toolbox 1.9.0, sdb-report
allows you to create and send email alerts through an SMTP server to immediately inform you of potential cluster health issues as they are identified.
These instructions will guide you through the setup and scheduling of the alert mechanism by configuring a local SMTP server (Postfix). This guide assumes you have an email client that you can use to receive emails from the SMTP server. You can integrate the alert mechanism with any SMTP email server and mail clients.
Refer to sdb-report send-alert for more information on the available command-line arguments.
Requirements
A self-managed cluster running MemSQL v6.7 or later / SingleStoreDB v7.1 or later
Toolbox version 1.9.0 or later
As alerting relies on
iostat
andsar
,sysstat
must be installed on each host in the clusterAn SMTP server
A text or UI-based email. This guide assumes that you have an email client that’s ready to receive emails
A job scheduling mechanism such as cron to schedule alerts
Configure the SMTP Server
Note
You may skip this step if you already have an SMTP server. You must ensure that your SMTP server is configured to send outbound messages to your desired recipients.
We use the SMTP server Postfix for the purpose of this guide, but you can configure this with any SMTP server.
Red Hat Distribution
Install Postfix.
sudo yum install postfix
Start Postfix.
systemctl start postfix
Enable the Postfix service to ensure that it will restart when the host is rebooted.
systemctl enable postfix
After installation, run the
postconf
command to see the Postfix configuration. Confirm that the record127.0.0.1 example.com
entry exists in/has been added to the/etc/hosts
file.
Debian Distribution
Install Postfix.
sudo apt-get install postfix
During installation, you’ll be prompted to select an SMTP server type. Choose the “Internet Site” option.
For testing purposes and/or a local-only configuration, use the suggested name (
example.com
) when prompted for a fully qualified domain name (FQDN) and click Ok.After installation, run the
postconf
command to see the Postfix configuration. Confirm that the domain you specified in Step 3 above (e.g.,record 127.0.0.1 example.com
) exists in/has been added to the/etc/hosts
file.
Create the Alert Configuration File
An alert configuration file must be created to specify where your SMTP server resides. This file also holds any configuration changes from the default for the thresholds. Here is the template of the config.yaml
file. This file will be passed in using the --config-file
argument demonstrated in the next step. In this case, the user is not updating any default thresholds for the alert criteria:
location: email: receivers: - receiver@example.com sender: sender@example.com server: host: smtp.example.com port: 25 username: password:
Modify this file by:
Updating the
host
field to reflect the host of the SMTP server.Updating the
port
to reflect the port that your SMTP server is listening on. This port must be open and available.Updating
senders
andreceivers
details to reflect where the email will be sent to and received from.Updating the
username
andpassword
if needed.Updating any default configurations for the alert thresholds (optional).
The following example file shows a configuration of an SMTP server running on an EC2 instance, sending an email to a singlestore.user
Gmail user. You may add multiple recipients to the receiver list. Additionally, the user is setting custom thresholds for the leavesNotOnline, orphanDatabases and memoryCommitted checkers. Refer to the tables on the Next Steps page for the list of configurable thresholds. You can apply this change to other checkers that are based on output as well.
location: email: receivers: - singlestore.user@gmail.com sender: ec2-user@ip-172-31-77-34.ec2.internal server: host: ip-172-30-77-34.ec2.internal port: 25 username: password: thresholds: leavesNotOnline: fail: 2 memoryCommitted: warn: 80 orphanDatabases: output_level: fail
Sending Alerts
The following is the simplest implementation of the sdb-report
command and will use the default thresholds. You may also configure your own thresholds for the checkers in this file using the configuration reference in Next Steps.
The following command will set up alerts to the SMTP server (which was installed and configured in the previous step) and will generate and send an alert exactly once to the location specified in the YAML file. This “one-liner” is a convenient method for testing your configuration before officially scheduling your alerts.
sdb-report send-alert --config-file config.yaml
Refer to sdb-report send-alert for more information on the available command-line arguments.
Scheduling Alerts
This example will collect and send alerts every 5 minutes via a cron
job. Note that other job-scheduling mechanisms are also supported.
*/5 * * * * sdb-report send-alert --config-file config.yaml
Troubleshooting
If you receive a timeout error, confirm that the port you are using for your SMTP (in this case, Postfix) is open and listening. The default Postfix port is 25
. If it is blocked on your host, you can change the port that Postfix uses by editing the first line of the /etc/postfix/master.cf
file.
Find the first instance of the word smtp in this file and replace smtp with the port number that is available. For example:
smtp inet n - n - - smtpd
Changing smtp
to 1025
will change the smtp port to 1025
.
1025 inet n - n - - smtpd
If you change this port, you will also need to update the port in your alert configuration file to use the same port.
After making this changes to the master.cf
file, reload Postfix:
sudo postfix reload
If the sdb-report send-alert
command displays an error or does not complete successfully, rerun the command with the --vvv
flag to produce verbose output.
Receive and Review Alerts
There are two distinct emails sent per alert command run: one for failures and one for warnings. If all checkers pass, no email will be sent.
This email was generated from the address specified, and sent to the server specified, in the alert-location configuration.
In the following example email, a failure was generated for a node being offline, but no other cluster issues were identified.
Date: Tue, 27 Oct 2020 21:39:16 +0000 From: ec2-user@ip-172-31-77-34.ec2.internal To: singlestore.user@gmail.com Subject: SingleStoreDB Cluster Alert: Cluster check failures SingleStoreDB cluster cluster_name has generated the following failures: leavesNotOnline ........................ [FAIL] FAIL leaf node on host 127.0.0.1 and port 3308 is offline
Next Steps
Command-Line Parameters
Refer to sdb-report send-alert for more information on the available command-line arguments.
Alerts Reference and Default Thresholds
Check + Description | Warn default | Fail default | Configurable? |
---|---|---|---|
Offline leaf nodes | Fail if >=1 Leaf Offline | Yes | |
Offline aggregator nodes | Fail if >= 1 Aggregator offline | Yes | |
Identifies if partitions are not balanced across the cluster | Any output | Yes Configure to switch this to Warning | |
Identifies if any orphan databases are found. Orphan databases should be examined and dropped | Any output | Yes Configure to switch this to Failure | |
Identifies databases that are in a pending state. Pending databases are not available for read/write queries. | Any output | Yes Configure to switch this to Warning | |
Identifies databases that are unrecoverable. | Any output | Yes Configure to switch to Warning | |
Determines if a database is redundant | Any output | Yes Configure to Switch to Warning | |
Checks free memory against total available | Less than 15% of the memory available | Less than 10% memory available | Yes |
| High availability not enabled (not configurable) | Master Partition missing its replica partition | Yes |
Checks for the presence of secondary replicating databases | Any output | Yes Configure to switch to Warning |
System Checks Thresholds
Check + Description | Warn default | Fail default | Configurable? |
---|---|---|---|
Checks the percentage of CPU idle time | 25.0% | 5.0% | Yes |
Determines the average time taken by the device to complete read requests | 10 ms | 25 ms | Yes |
Determines the average time taken by the device to complete write requests | 10 ms | 25 ms | Yes |
Checks free disk space and identifies if the disk is approaching its capacity limits | 70% | 80% | Yes |
Checks free disk inodes | 70% | 85% | Yes |
Checks the number of major page faults generated by the system per second | 10 majftl/s | 20 majflt/s | Yes |
Checks the percentage of swap space used | 5% | 10% | Yes |
Determines the percentage of memory required for a given workload | 70% | 90% | Yes |