Skip to main content

Configure Alerts

Configure SingleStoreDB Alerts

The sdb-report tool allows you to run a variety of recommended health checks on your cluster. As of Toolbox 1.9.0, sdb-report allows you to create and send email alerts through an SMTP server to immediately inform you of potential cluster health issues as they are identified.

These instructions will guide you through the setup and scheduling of the alert mechanism by configuring a local SMTP server (Postfix). This guide assumes you have an email client that you can use to receive emails from the SMTP server. You can integrate the alert mechanism with any SMTP email server and mail clients.

Refer to sdb-report send-alert for more information on the available command-line arguments.

Requirements

  • A self-managed cluster running MemSQL v6.7 or later / SingleStoreDB v7.1 or later

  • Toolbox version 1.9.0 or later

  • As alerting relies on iostat and sar, sysstat must be installed on each host in the cluster

  • An SMTP server

  • A text or UI-based email. This guide assumes that you have an email client that’s ready to receive emails

  • A job scheduling mechanism such as cron to schedule alerts

Configure the SMTP Server

Note

You may skip this step if you already have an SMTP server. You must ensure that your SMTP server is configured to send outbound messages to your desired recipients.

We use the SMTP server Postfix for the purpose of this guide, but you can configure this with any SMTP server.

Red Hat Distribution

  1. Install Postfix.

    sudo yum install postfix
    
  2. Start Postfix.

    systemctl start postfix
    
  3. Enable the Postfix service to ensure that it will restart when the host is rebooted.

    systemctl enable postfix
    
  4. After installation, run the postconf command to see the Postfix configuration. Confirm that the record 127.0.0.1 example.com entry exists in/has been added to the /etc/hosts file.

Debian Distribution

  1. Install Postfix.

    sudo apt-get install postfix
    
  2. During installation, you’ll be prompted to select an SMTP server type. Choose the Internet Site option.

  3. For testing purposes and/or a local-only configuration, use the suggested name (example.com) when prompted for a fully qualified domain name (FQDN) and click Ok.

  4. After installation, run the postconf command to see the Postfix configuration. Confirm that the domain you specified in Step 3 above (e.g., record 127.0.0.1 example.com) exists in/has been added to the /etc/hosts file.

Create the Alert Configuration File

An alert configuration file must be created to specify where your SMTP server resides. This file also holds any configuration changes from the default for the thresholds. Here is the template of the config.yaml file. This file will be passed in using the --config-file argument demonstrated in the next step. In this case, the user is not updating any default thresholds for the alert criteria:

location:
  email:
    receivers:
      - receiver@example.com
    sender: sender@example.com
    server:
      host: smtp.example.com
      port: 25
      username:
      password:

Modify this file by:

  1. Updating the host field to reflect the host of the SMTP server.

  2. Updating the port to reflect the port that your SMTP server is listening on. This port must be open and available.

  3. Updating senders and receivers details to reflect where the email will be sent to and received from.

  4. Updating the username and password if needed.

  5. Updating any default configurations for the alert thresholds (optional).

The following example file shows a configuration of an SMTP server running on an EC2 instance, sending an email to a singlestore.user Gmail user. You may add multiple recipients to the receiver list. Additionally, the user is setting custom thresholds for the leavesNotOnline, orphanDatabases and memoryCommitted checkers. Refer to the tables on the Next Steps page for the list of configurable thresholds. You can apply this change to other checkers that are based on output as well.

location:
  email:
    receivers:
      - singlestore.user@gmail.com
    sender: ec2-user@ip-172-31-77-34.ec2.internal
    server:
      host: ip-172-30-77-34.ec2.internal
      port: 25
      username: 
      password:
thresholds:
  leavesNotOnline:
    fail: 2
  memoryCommitted:
    warn: 80
  orphanDatabases:
    output_level: fail

Sending Alerts

The following is the simplest implementation of the sdb-report command and will use the default thresholds. You may also configure your own thresholds for the checkers in this file using the configuration reference in Next Steps.

The following command will set up alerts to the SMTP server (which was installed and configured in the previous step) and will generate and send an alert exactly once to the location specified in the YAML file. This one-liner is a convenient method for testing your configuration before officially scheduling your alerts.

sdb-report send-alert --config-file config.yaml

Refer to sdb-report send-alert for more information on the available command-line arguments.

Scheduling Alerts

This example will collect and send alerts every 5 minutes via a cron job. Note that other job-scheduling mechanisms are also supported.

*/5 * * * * sdb-report send-alert --config-file config.yaml

Troubleshooting

If you receive a timeout error, confirm that the port you are using for your SMTP (in this case, Postfix) is open and listening. The default Postfix port is 25. If it is blocked on your host, you can change the port that Postfix uses by editing the first line of the /etc/postfix/master.cf file.

Find the first instance of the word smtp in this file and replace smtp with the port number that is available. For example:

smtp     inet  n       -       n       -       -       smtpd

Changing smtp to 1025 will change the smtp port to 1025.

1025     inet  n       -       n       -       -       smtpd

If you change this port, you will also need to update the port in your alert configuration file to use the same port.

After making this changes to the master.cf file, reload Postfix:

sudo postfix reload

If the sdb-report send-alert command displays an error or does not complete successfully, rerun the command with the --vvv flag to produce verbose output.

Receive and Review Alerts

There are two distinct emails sent per alert command run: one for failures and one for warnings. If all checkers pass, no email will be sent.

This email was generated from the address specified, and sent to the server specified, in the alert-location configuration.

In the following example email, a failure was generated for a node being offline, but no other cluster issues were identified.

Date: Tue, 27 Oct 2020 21:39:16 +0000
From: ec2-user@ip-172-31-77-34.ec2.internal
To: singlestore.user@gmail.com
Subject: SingleStoreDB Cluster Alert: Cluster check failures

SingleStoreDB cluster cluster_name has generated the following failures:
leavesNotOnline ........................ [FAIL]
FAIL leaf node on host 127.0.0.1 and port 3308 is offline

Next Steps

Command-Line Parameters

Refer to sdb-report send-alert for more information on the available command-line arguments.

Alerts Reference and Default Thresholds

Check + Description

Warn default

Fail default

Configurable?

leavesNotOnline

Offline leaf nodes

Fail if >=1 Leaf Offline

Yes

offlineAggregators

Offline aggregator nodes

Fail if >= 1 Aggregator offline

Yes

explainRebalancePartitionsChecker

Identifies if partitions are not balanced across the cluster

Any output

Yes

Configure to switch this to Warning

orphanDatabases

Identifies if any orphan databases are found. Orphan databases should be examined and dropped

Any output

Yes

Configure to switch this to Failure

pendingDatabases

Identifies databases that are in a pending state. Pending databases are not available for read/write queries.

Any output

Yes

Configure to switch this to Warning

unrecoverableDatabases

Identifies databases that are unrecoverable.

Any output

Yes

Configure to switch to Warning

userDatabaseRedundancy

Determines if a database is redundant

Any output

Yes

Configure to Switch to Warning

clusterMemoryUsage

Checks free memory against total available

Less than 15% of the memory available

Less than 10% memory available

Yes

userDatabaseRedundancy

High availability not enabled (not configurable)

Master Partition missing its replica partition

Yes

secondaryDatabases

Checks for the presence of secondary replicating databases

Any output

Yes

Configure to switch to Warning

System Checks Thresholds

Check + Description

Warn default

Fail default

Configurable?

cpuIdle

Checks the percentage of CPU idle time

25.0%

5.0%

Yes

diskLatencyRead

Determines the average time taken by the device to complete read requests

10 ms

25 ms

Yes

diskLatencyWrite

Determines the average time taken by the device to complete write requests

10 ms

25 ms

Yes

diskUsage

Checks free disk space and identifies if the disk is approaching its capacity limits

70%

80%

Yes

diskInodesUsage

Checks free disk inodes

70%

85%

Yes

majorPageFaults

Checks the number of major page faults generated by the system per second

10 majftl/s

20 majflt/s

Yes

swapUsage

Checks the percentage of swap space used

5%

10%

Yes

memoryCommitted

Determines the percentage of memory required for a given workload

70%

90%

Yes