SingleStore DB

Enabling High Availability

The leaf_failover_fanout global variable defines the placement of replica partitions among leaves in a cluster. It has two modes: paired and load_balanced. By default, leaf_failover_fanout is set to paired mode. In paired mode, the auto-rebalance operation only affects the newly attached leaves and their pairs. In load_balanced mode, the auto-rebalance operation runs a full rebalance of all the databases on all the leaves in a cluster.

The following sections discuss how to enable high availability for each of these modes.

When a leaf fails, the partitions on its paired leaf are promoted to master partitions. As a result, this leaf has all master partitions and no replica partitions. Which increases the overall workload of this leaf, affecting its performance. For more information on leaf pairing, see Availability Groups.

Enabling HA on existing leaf nodes

The SingleStore DB management tools support enabling high availability through the sdb-admin enable-high-availability command.

sdb-admin enable-high-availability [--password <secure-password>]

During this process, half of the leaves in the cluster from the availability group 1 will move into the availability group 2. This is considered an offline operation. As all partitions of all databases will be duplicated, each leaf node needs at least 50% free memory and disk space. In addition, the cluster must have an even number of leaves.

If you need to add more nodes to the cluster, we recommend doing it before enabling High Availability.

You can verify high availability has been enabled by seeing more details on your leaf nodes through sdb-admin list-nodes.

sdb-admin list-nodes
****
+------------+------------+-------+------+---------------+--------------+---------+----------------+--------------------+
| MemSQL ID  |    Role    | Host  | Port | Process State | Connectable? | Version | Recovery State | Availability Group |
+------------+------------+-------+------+---------------+--------------+---------+----------------+--------------------+
| D24FA96083 | Master     | node1 | 3306 | Running       | True         | 6.7.7   | Online         |                    |
| F3E5FDB741 | Aggregator | node2 | 3306 | Running       | True         | 6.7.7   | Online         |                    |
| 5B536C630D | Leaf       | node3 | 3306 | Running       | True         | 6.7.7   | Online         | 1                  |
| 9156C4C91F | Leaf       | node4 | 3306 | Running       | True         | 6.7.7   | Online         | 2                  |
| 92CC911954 | Leaf       | node4 | 3307 | Running       | True         | 6.7.7   | Online         | 2                  |
| DD82C2B35E | Leaf       | node3 | 3307 | Running       | True         | 6.7.7   | Online         | 1                  |
+------------+------------+-------+------+---------------+--------------+---------+----------------+--------------------+
Enabling HA for a cluster with new leaf nodes

When you add new nodes to a cluster that does not have high availability already enabled, you must perform the following steps to change your cluster configuration and also configure the new leaf nodes appropriately.

Step 1. Update the redundancy_level value

On the Master Aggregator run:

SET @@GLOBAL.redundancy_level = 2;

This updates the current configuration and sets the cluster to run in redundancy-2 operation mode.

In addition, update the SingleStore DB configuration file on the Master Aggregator to make sure the change is not lost whenever it is restarted.

Unlike in previous versions of SingleStore DB, redundancy_level can only be set on the Master Aggregator.

sdb-admin list-nodes --role master -q | xargs -I % sdb-admin update-config --memsql-id % --key redundancy_level --value 2 --set-global -y
Step 2. Deploy new nodes

The cluster is now running in redundancy-2 operation mode, however all existing leaves belong to the availability group 1.

At this point, start an equal number of new nodes as the current number of leaf nodes. For clusters managed by the SingleStore DB management tool sdb-admin, create and assign the new nodes as leaf nodes. If you are managing your cluster with MemSQL Ops, deploy MemSQL Ops agents on all of the new nodes.

The following shows how to add two leaf nodes using sdb-admin. The current cluster has two leaf nodes.

sdb-admin list-nodes
****
+------------+------------+-------+------+---------------+--------------+---------+----------------+--------------------+
| MemSQL ID  |    Role    | Host  | Port | Process State | Connectable? | Version | Recovery State | Availability Group |
+------------+------------+-------+------+---------------+--------------+---------+----------------+--------------------+
| 43F1B836D3 | Master     | node1 | 3306 | Running       | True         | 6.7.7   | Online         |                    |
| E4921A995C | Aggregator | node1 | 3307 | Running       | True         | 6.7.7   | Online         |                    |
| 74BBE83C45 | Leaf       | node1 | 3308 | Running       | True         | 6.7.7   | Online         | 1                  |
| A6D82670D8 | Leaf       | node1 | 3309 | Running       | True         | 6.7.7   | Online         | 1                  |
+------------+------------+-------+------+---------------+--------------+---------+----------------+--------------------+

For demonstration purposes, the following shows how to add two new local leaf nodes.

sdb-admin create-node --port 3310 -y
sdb-admin create-node --port 3311 -y

Because new nodes default to the role of unknown, you can assign them as leaf nodes using bash piping.

sdb-admin list-nodes -q -r unknown | xargs -I % sdb-admin add-leaf --memsql-id %  -y

Now verify that the new nodes are added, but added in availability group 2.

sdb-admin list-nodes
****
+------------+------------+-------+------+---------------+--------------+---------+----------------+--------------------+
| MemSQL ID  |    Role    | Host  | Port | Process State | Connectable? | Version | Recovery State | Availability Group |
+------------+------------+-------+------+---------------+--------------+---------+----------------+--------------------+
| 43F1B836D3 | Master     | node1 | 3306 | Running       | True         | 6.7.7   | Online         |                    |
| E4921A995C | Aggregator | node1 | 3307 | Running       | True         | 6.7.7   | Online         |                    |
| 4DAE7D1F54 | Leaf       | node1 | 3310 | Running       | True         | 6.7.7   | Online         | 2                  |
| 52CA34CD0C | Leaf       | node1 | 3311 | Running       | True         | 6.7.7   | Online         | 2                  |
| 74BBE83C45 | Leaf       | node1 | 3308 | Running       | True         | 6.7.7   | Online         | 1                  |
| A6D82670D8 | Leaf       | node1 | 3309 | Running       | True         | 6.7.7   | Online         | 1                  |
+------------+------------+-------+------+---------------+--------------+---------+----------------+--------------------+
Step 3. Rebalance the cluster

Finally, you need to rebalance the cluster, by running REBALANCE ALL DATABASES.

Note that, before running REBALANCE ALL DATABASES, only 1 master partition is still available for each database, which means high availability is not yet provided. For instance:

SHOW PARTITIONS ON memsql_demo;
****
+---------+-------+------+--------+--------+
| Ordinal | Host  | Port | Role   | Locked |
+---------+-------+------+--------+--------+
|       0 | node1 | 3308 | Master |      0 |
|       1 | node1 | 3309 | Master |      0 |
|       2 | node1 | 3308 | Master |      0 |
|       3 | node1 | 3309 | Master |      0 |
|       0 | node1 | 3310 | Master |      0 |
|       1 | node1 | 3311 | Master |      0 |
|       2 | node1 | 3310 | Master |      0 |
|       3 | node1 | 3311 | Master |      0 |
+---------+-------+------+--------+--------+
8 rows in set (0.01 sec)
REBALANCE ALL DATABASES;

After running REBALANCE ALL DATABASES, notice that both Master and replica partitions exist within the database cluster.

SHOW PARTITIONS ON memsql_demo;
****
+---------+-------+------+--------+--------+
| Ordinal | Host  | Port | Role   | Locked |
+---------+-------+------+--------+--------+
|       0 | node1 | 3308 | Slave  |      0 |
|       1 | node1 | 3309 | Slave  |      0 |
|       2 | node1 | 3308 | Master |      0 |
|       3 | node1 | 3309 | Master |      0 |
|       0 | node1 | 3310 | Master |      0 |
|       1 | node1 | 3311 | Master |      0 |
|       2 | node1 | 3310 | Slave  |      0 |
|       3 | node1 | 3311 | Slave  |      0 |
+---------+-------+------+--------+--------+
8 rows in set (0.01 sec)

load_balanced mode places a leaf’s replica partitions evenly across the healthy leaves in the cluster. For example, a cluster having eight leaves split into two availability groups has four leaves in each availability group. Each of the eight leaves will contain replica partitions of masters from two leaves in the other availability group. Each master partition has only a single replica partition.

If a leaf L fails, L’s replica partitions on the healthy leaves in the opposite availability group are promoted to master partitions. Because these master partitions are spread evenly across the healthy leaves, a single healthy leaf does not have to resume all of L’s work

Notice

If the leaf_failover_fanout variable is set to the load_balanced mode, your cluster is not restricted to an even number of leaves. For details on how to add leaf nodes to your cluster, see the Enabling HA for a cluster with new leaf nodes section below

Enabling HA on existing leaf nodes

You can enable high availability in a SingleStore DB cluster using the sdb-admin enable-high-availability command.

sdb-admin enable-high-availability [--password <secure-password>]

If you need to add more nodes to the cluster, we recommend doing it before enabling High Availability.

Creating a new database

After setting up the cluster, change the leaf_failover_fanout mode to load_balanced.

SET GLOBAL leaf_failover_fanout = 'load_balanced';
CREATE DATABASE db;
Using an existing database in load_balanced mode

To use existing databases in load_balanced mode, you’ll need to rebalance the database after changing the leaf_failover_fanout mode.

SET GLOBAL leaf_failover_fanout = 'load_balanced';
REBALANCE PARTITIONS ON db;

You need to rebalance all the databases in the cluster. To see the actions involved in rebalancing, execute EXPLAIN REBALANCE PARTITIONS on db.

For large databases, the rebalance operation can be time consuming as the partitions are moved between different hosts. Additionally, the rebalance operation requires free memory and disk as the cluster will temporarily maintain two replicas of the partitions being moved. It is recommended that leaves have at least 30-50% free memory and disk to successfully rebalance the cluster.

Enabling HA for a cluster with new leaf nodes

When you add new nodes to a cluster that does not have high availability already enabled, you must perform the following steps to change your cluster configuration and also configure the new leaf nodes appropriately.