SingleStore DB

High Availability Commands
Summary of High Availability Commands
High-level Commands

Use the following commands to manage high availability in SingleStore DB:

For more information, see Administering a Cluster.

Working with Leaves

Leaf Nodes are responsible for storing slices of data in the distributed system. Each leaf is just a SingleStore DB single-box server consisting of several partitions– each partition is just a database on that server. If you have a database named test and run SHOW DATABASES on a leaf, you will see names resembling test_5 (this would be partition 5 for database test).

Use the following commands to work with leaves:

  • ADD LEAF adds previously unknown nodes into the cluster.

  • REMOVE LEAF first rebalances away a leaf node’s partitions and then removes it from the list of - leaves in SHOW LEAVES.

  • ATTACH LEAF transitions a detached node back into the online state (see Leaf States). It can also be used to introduce a new, previously unknown node, to the cluster. In that case, unlike ADD LEAF, this command will analyze the data present on the leaf node and try to reintroduce it back into the system if possible. In redundancy-2, ATTACH LEA will automatically rebalance partitions between a node and its pair to equalize the number of master and replica partitions across the pair. This is an example of an HA command converging the state of the cluster towards balance.

  • DETACH LEAF transitions the leaf to the detached state instead of removing it.

You should run REMOVE LEAF on leaves that you no longer wish to track as part of the distributed system. If you plan to run maintenance on a machine and want to temporarily relieve it from serving data, you should detach it first, perform the necessary maintenance, restart it, and then use ATTACH LEAF to reintroduce it. ATTACH LEAF will automatically recover what data it can from the machine and rebalance partitions with its pair to restore balance.

During or after running ATTACH LEAF , REMOVE LEAF , or DETACH LEAF , you can run SHOW REBALANCE STATUS on any database to see what low-level partition operations the aggregator ran as part of the operation.

Rebalancing the Cluster

HA operations in SingleStore DB generate a rebalance plan: a series of low-level commands that converge the cluster towards a balanced state. A redundancy-1 cluster is balanced if every online leaf has an equal number of partitions. A redundancy-2 cluster is balanced if every pair of leaves has an equal number of partitions, and, within a pair, each leaf has an equal number of master and replica partitions.

Since partitions are per-database, each rebalance plan is also per-database. REMOVE LEAF , REMOVE LEAF, DETACH LEAF, and REBALANCE PARTITIONS all work this way.

Note

SingleStore DB does auto-rebalance after auto-attaches. You do not need to run REBALANCE PARTITIONS manually with the exception of a few situations. The most common situation is expanding the cluster (rebalancing of data is not done automatically for ADD LEAF, mainly to allow adding multiple leaves and doing a single rebalance finally) or potentially if the auto-rebalance failed for some reason.

The REBALANCE PARTITIONS command examines the state of partitions and leaves for a particular database, generates a plan to rebalance the partitions across the online leaves, and executes that plan. You can use the EXPLAIN REBALANCE PARTITIONS to view the plan that REBALANCE PARTITIONS would execute. If this command returns an empty result, then the database is balanced. The following is an example output from EXPLAIN REBALANCE PARTITIONS run on a small sample cluster:

EXPLAIN REBALANCE PARTITIONS ON test;
****
+-----------------------------+---------+-------------+-------------+-------+-----------------------------+
| Action                      | Ordinal | Target_Host | Target_Port | Phase | Estimated_Time_Remaining_Ms |
+-----------------------------+---------+-------------+-------------+-------+-----------------------------+
| COPY PARTITION              | 13      | 127.0.0.1   | 10002       | 3     | 7821                        |
| COPY PARTITION              | 2       | 127.0.0.1   | 10003       | 4     | 7821                        |
| COPY PARTITION              | 2       | 127.0.0.1   | 10004       | 4     | 7821                        |
| PROMOTE PARTITION WITH DROP | 2       | 127.0.0.1   | 10003       | 5     | 1351                        |
| DROP PARTITION              | 2       | 127.0.0.1   | 10002       | 6     | 1002                        |
| DROP PARTITION              | 13      | 127.0.0.1   | 10004       | 6     | 1002                        |
+-----------------------------+---------+-------------+-------------+-------+-----------------------------+

The Ordinal is the target partition’s ordinal, and the Target_Host and Target_Port correspond to the destination leaf. Any two operations in the same Phase can be run in parallel and phases are run in order. For example, the aggregator can run COPY PARTITION on partitions test:1 and test:3 in parallel. The exact meaning of what each phase means is arbitrary and depends on the particular rebalance operation. The Estimated_Time_Remaining_Ms is the estimated total time (in milliseconds) that each low-level command will take to complete.

To execute this plan (assuming the state of the cluster does not change before you proceed), you can run the REBALANCE PARTITIONS command:

REBALANCE PARTITIONS ON test;
Query OK, 1 row affected (9.80 sec)

The main use-case for the REBALANCE PARTITIONS command is to rebalance data onto new leaf nodes in the cluster. This situation can arise from adding new leaf nodes to the cluster to expand capacity, or if leaf nodes terminate irrecoverably and you wish to replace them with new nodes. For common leaf-failure scenarios, ATTACH LEAF will automatically perform the necessary rebalance operations on the affected leaf nodes.

Although rebalance operations can take some time to complete, you can continue to read and write data to the database while it runs (the operation is online). While REBALANCE PARTITIONS runs, use the SHOW REBALANCE STATUS command to examine its running state:

SHOW REBALANCE STATUS ON test;
****
+-----------------------------------+---------+-------------+-------------+-------+-----------+--------------+-----------------------------+
| Action                            | Ordinal | Target_Host | Target_Port | Phase | Status    | Running_Time | Estimated_Time_Remaining_Ms |
+-----------------------------------+---------+-------------+-------------+-------+-----------+--------------+-----------------------------+
| COPY PARTITION                    | 13      | 127.0.0.1   | 10002       | 3     | success   | 4009         | 0                           |
| COPY PARTITION                    | 2       | 127.0.0.1   | 10003       | 4     | success   | 5109         | 0                           |
| COPY PARTITION                    | 2       | 127.0.0.1   | 10004       | 4     | success   | 5112         | 0                           |
| PROMOTE PARTITION WITH DROP FORCE | 2       | 127.0.0.1   | 10003       | 5     | running   | 405          | 1130                        |
| DROP PARTITION                    | 2       | 127.0.0.1   | 10002       | 6     | scheduled | NULL         | 1002                        |
| DROP PARTITION                    | 13      | 127.0.0.1   | 10004       | 6     | scheduled | NULL         | 1002                        |
+-----------------------------------+---------+-------------+-------------+-------+-----------+--------------+-----------------------------+

The output is the same as EXPLAIN REBALANCE PARTITIONS with the addition of three new columns. Status is one of scheduled, running, or success. For running or success commands, the Running_Time indicates the time in milliseconds that has been spent running the particular low-level command. The Estimated_Time_Remaining_Ms indicates the estimated remaining time (in milliseconds) for each low-level command to complete.

After REBALANCE PARTITIONS completes, SHOW REBALANCE STATUS displays a summary of the last rebalance operation until the next one is run.

SHOW REBALANCE STATUS ON test;
****
+-------------------+---------+-------------+-------------+-------+---------+--------------+-----------------------------+
| Action            | Ordinal | Target_Host | Target_Port | Phase | Status  | Running_Time | Estimated_Time_Remaining_Ms |
+-------------------+---------+-------------+-------------+-------+---------+--------------+-----------------------------+
| COPY PARTITION    |       1 | leaf-6      |        3306 |     2 | success |         2870 | 0                           |
| COPY PARTITION    |       3 | leaf-8      |        3306 |     2 | success |         2903 | 0                           |
| COPY PARTITION    |       5 | leaf-6      |        3306 |     2 | success |         2903 | 0                           |
| COPY PARTITION    |       7 | leaf-8      |        3306 |     2 | success |         2903 | 0                           |
| PROMOTE PARTITION |       1 | leaf-6      |        3306 |     6 | success |         4131 | 0                           |
| PROMOTE PARTITION |       3 | leaf-8      |        3306 |     6 | success |         4165 | 0                           |
| COPY PARTITION    |       1 | leaf-2      |        3306 |     7 | success |         2606 | 0                           |
| COPY PARTITION    |       3 | leaf-4      |        3306 |     7 | success |         2627 | 0                           |
+-------------------+---------+-------------+-------------+-------+---------+--------------+-----------------------------+

REMOVE LEAF, and DETACH LEAF also generate and execute plans for every database relevant to the target leaf node.

Attaching Leaves - Examples

In this section, we will walk through a full example of running ATTACH LEAF on redundancy-1 and redundancy-2 clusters. Redundancy-1 vs. redundancy-2 clusters are explained earlier in High Availability Architecture.

ATTACH LEAF In Redundancy-1

Let’s walk through a full example in a redundancy-1 cluster. This cluster has four leaves leaf-1 through leaf-4 and a database named test. In this example, leaf-4 will fail, and we will recover and reintroduce it into the system with ATTACH LEAF.

Here is the initial state of the cluster:

-- SHOW LEAVES returns all the leaves in the cluster and their current states.
SHOW LEAVES;
****
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
| Host   | Port  | Availability_Group | Pair_Host | Pair_Port | State  | Opened_Connections | Average_Roundtrip_Latency |
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
| leaf-1 |  3306 |                  1 | NULL      |      NULL | online |                  2 |                     0.397 |
| leaf-2 |  3306 |                  1 | NULL      |      NULL | online |                  2 |                     0.397 |
| leaf-3 |  3306 |                  1 | NULL      |      NULL | online |                  2 |                     0.349 |
| leaf-4 |  3306 |                  1 | NULL      |      NULL | online |                  2 |                     0.363 |
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
4 rows in set (0.00 sec)
-- SHOW PARTITIONS returns all the partitions on a given database.
SHOW PARTITIONS ON test;
****
+---------+--------+-------+--------+--------+
| Ordinal | Host   | Port  | Role   | Locked |
+---------+--------+-------+--------+--------+
|       0 | leaf-1 |  3306 | Master | 0      |
|       1 | leaf-2 |  3306 | Master | 0      |
|       2 | leaf-3 |  3306 | Master | 0      |
|       3 | leaf-4 |  3306 | Master | 0      |
|       4 | leaf-1 |  3306 | Master | 0      |
|       5 | leaf-2 |  3306 | Master | 0      |
|       6 | leaf-3 |  3306 | Master | 0      |
|       7 | leaf-4 |  3306 | Master | 0      |
+---------+--------+-------+--------+--------+
8 rows in set (0.00 sec)

– You can query in INFORMATION_SCHEMA.DISTRIBUTED_PARTITIONS to slice – and dice the partitions map.

SELECT * FROM INFORMATION_SCHEMA.DISTRIBUTED_PARTITIONS
    ->    WHERE DATABASE_NAME='test' AND Host='leaf-4';
****
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
| DATABASE_NAME | ORDINAL | HOST   | PORT | ROLE   | LOCKED | NODE_ID | IS_OFFLINE | SYNC_DESIRED | STATE | TERM | INSTANCE_ID |
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
| test          |       3 | leaf-4 | 3306 | Master | 0      | 3       | 0          | 0            | async | 5    | 4           |
| test          |       7 | leaf-4 | 3306 | Master | 0      | 3       | 0          | 0            | async | 9    | 8           |
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
2 rows in set (0.01 sec)
-- test.x has 100 consecutive values from 1 to 100.
SELECT MIN(id), MAX(id), COUNT(*) FROM test.x;
****
+---------+---------+----------+
| min(id) | max(id) | count(*) |
+---------+---------+----------+
|       1 |     100 |      100 |
+---------+---------+----------+
1 row in set (0.00 sec)

If we kill the SingleStore DB instance on leaf-4, then leaf-4 will transition to offline in SHOW LEAVES, and partitions 3 and 7 will be unmapped on test.

SHOW LEAVES;
****
+--------+-------+--------------------+-----------+-----------+---------+--------------------+---------------------------+
| Host   | Port  | Availability_Group | Pair_Host | Pair_Port | State   | Opened_Connections | Average_Roundtrip_Latency |
+--------+-------+--------------------+-----------+-----------+---------+--------------------+---------------------------+
| leaf-1 |  3306 |                  1 | NULL      |      NULL | online  |                  2 |                     0.414 |
| leaf-2 |  3306 |                  1 | NULL      |      NULL | online  |                  2 |                     0.341 |
| leaf-3 |  3306 |                  1 | NULL      |      NULL | online  |                  2 |                     0.358 |
| leaf-4 |  3306 |                  1 | NULL      |      NULL | offline |                  0 |                      NULL |
+--------+-------+--------------------+-----------+-----------+---------+--------------------+---------------------------+
4 rows in set (0.00 sec)
SHOW PARTITIONS ON test;
****
+---------+--------+-------+--------+--------+
| Ordinal | Host   | Port  | Role   | Locked |
+---------+--------+-------+--------+--------+
|       0 | leaf-1 |  3306 | Master | 0      |
|       1 | leaf-2 |  3306 | Master | 0      |
|       2 | leaf-3 |  3306 | Master | 0      |
|       3 | NULL   |  NULL | NULL   | 0      |
|       4 | leaf-1 |  3306 | Master | 0      |
|       5 | leaf-2 |  3306 | Master | 0      |
|       6 | leaf-3 |  3306 | Master | 0      |
|       7 | NULL   |  NULL | NULL   | 0      |
+---------+--------+-------+--------+--------+
8 rows in set (0.00 sec)

The database is now offline for reads and writes:

SELECT MIN(id), MAX(id), COUNT(*) FROM test.x;
****
ERROR 1777 (HY000): Partition test:3 has no master instance.

After restarting SingleStore DB on leaf-4, master aggregator will notice the leaf is reachable again and attach it back to the cluster. Leaf will automatically transition to the online state, and all the partitions on the leaf will be automatically imported.

SHOW LEAVES;
****
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
| Host   | Port  | Availability_Group | Pair_Host | Pair_Port | State  | Opened_Connections | Average_Roundtrip_Latency |
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
| leaf-1 |  3306 |                  1 | NULL      |      NULL | online |                  4 |                     0.437 |
| leaf-2 |  3306 |                  1 | NULL      |      NULL | online |                  4 |                     0.411 |
| leaf-3 |  3306 |                  1 | NULL      |      NULL | online |                  4 |                     0.366 |
| leaf-4 |  3306 |                  1 | NULL      |      NULL | online |                  1 |                     0.418 |
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
4 rows in set (0.00 sec)
-- The data that was on leaf-4 is back!
SELECT MIN(id), MAX(id), COUNT(*) FROM test.x;
****
+---------+---------+----------+
| min(id) | max(id) | count(*) |
+---------+---------+----------+
|       1 |     100 |      100 |
+---------+---------+----------+
1 row in set (0.00 sec)

You can disable the master aggregator from automatically moving leaves that become visible to the online state by running:

set global auto_attach = Off;
Query OK, 0 rows affected (0.00 sec)

In this case, after restarting SingleStore DB on leaf-4 and waiting for it to recover, instead of node being in offline state we will see it transitioning to a detached state:

-- leaf-4 is in the detached state.
SHOW LEAVES;
****
+--------+-------+--------------------+-----------+-----------+----------+--------------------+---------------------------+
| Host   | Port  | Availability_Group | Pair_Host | Pair_Port | State    | Opened_Connections | Average_Roundtrip_Latency |
+--------+-------+--------------------+-----------+-----------+----------+--------------------+---------------------------+
| leaf-1 |  3306 |                  1 | NULL      |      NULL | online   |                  1 |                     0.467 |
| leaf-2 |  3306 |                  1 | NULL      |      NULL | online   |                  1 |                     0.359 |
| leaf-3 |  3306 |                  1 | NULL      |      NULL | online   |                  1 |                     0.405 |
| leaf-4 |  3306 |                  1 | NULL      |      NULL | detached |                  1 |                     0.360 |
+--------+-------+--------------------+-----------+-----------+----------+--------------------+---------------------------+
4 rows in set (0.00 sec)

After that you can manually attach the leaf using ATTACH LEAF:

-- This will reintroduce leaf-4 into the system.
attach leaf 'leaf-4': 3306;
Query OK, 1 row affected (0.52 sec)
-- leaf-4 is now online.
SHOW LEAVES;
****
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
| Host   | Port  | Availability_Group | Pair_Host | Pair_Port | State  | Opened_Connections | Average_Roundtrip_Latency |
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
| leaf-1 |  3306 |                  1 | NULL      |      NULL | online |                  4 |                     0.437 |
| leaf-2 |  3306 |                  1 | NULL      |      NULL | online |                  4 |                     0.411 |
| leaf-3 |  3306 |                  1 | NULL      |      NULL | online |                  4 |                     0.366 |
| leaf-4 |  3306 |                  1 | NULL      |      NULL | online |                  1 |                     0.418 |
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
4 rows in set (0.00 sec)
-- ATTACH LEAF imported partitions 3 and 7 from leaf-4.
SHOW REBALANCE STATUS ON test;
****
+------------------+---------+-------------+-------------+-------+---------+--------------+-----------------------------+
| Action           | Ordinal | Target_Host | Target_Port | Phase | Status  | Running_Time | Estimated_Time_Remaining_Ms |
+------------------+---------+-------------+-------------+-------+---------+--------------+-----------------------------+
| ATTACH PARTITION | 3       | leaf-4      | 3306        | 1     | success | 1            | 0                           |
| ATTACH PARTITION | 7       | leaf-4      | 3306        | 1     | success | 3            | 0                           |
+------------------+---------+-------------+-------------+-------+---------+--------------+-----------------------------+
-- The data that was on leaf-4 is back!
SELECT MIN(id), MAX(id), COUNT(*) FROM test.x;
****
+---------+---------+----------+
| min(id) | max(id) | count(*) |
+---------+---------+----------+
|       1 |     100 |      100 |
+---------+---------+----------+
1 row in set (0.00 sec)
ATTACH LEAF In Redundancy-2

Notice

This section is only applicable for clusters with leaf_failover_fanout mode set to paired.

Let’s work through a full example of attaching a leaf back into a redundancy-2 cluster. This cluster has eight leaves: leaf-1 through leaf-8. You should understand leaf pairings before following along with this example: see Availability Groups for details.

SHOW LEAVES;
****
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
| Host   | Port  | Availability_Group | Pair_Host | Pair_Port | State  | Opened_Connections | Average_Roundtrip_Latency |
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
| leaf-1 |  3306 |                  1 | leaf-5    |      3306 | online |                  1 |                     0.424 |
| leaf-2 |  3306 |                  1 | leaf-6    |      3306 | online |                  1 |                     0.391 |
| leaf-3 |  3306 |                  1 | leaf-7    |      3306 | online |                  1 |                     0.370 |
| leaf-4 |  3306 |                  1 | leaf-8    |      3306 | online |                  1 |                     0.408 |
| leaf-5 |  3306 |                  2 | leaf-1    |      3306 | online |                  1 |                     0.405 |
| leaf-6 |  3306 |                  2 | leaf-2    |      3306 | online |                  1 |                     0.395 |
| leaf-7 |  3306 |                  2 | leaf-3    |      3306 | online |                  1 |                     0.371 |
| leaf-8 |  3306 |                  2 | leaf-4    |      3306 | online |                  1 |                     0.403 |
+--------+-------+--------------------+-----------+-----------+--------+--------------------+---------------------------+
8 rows in set (0.00 sec)
-- test.x has 100 consecutive values from 1 to 100.
SELECT MIN(id), MAX(id), COUNT(*) FROM test.x;
****
+---------+---------+----------+
| min(id) | max(id) | count(*) |
+---------+---------+----------+
|       1 |     100 |      100 |
+---------+---------+----------+
1 row in set (0.00 sec)

This is what a partition map might look like on the master aggregator before leaf-8 dies:

-- leaf-8 owns the master partition for test:3.
SELECT * FROM INFORMATION_SCHEMA.DISTRIBUTED_PARTITIONS
    ->    WHERE DATABASE_NAME='test' AND ORDINAL=3;
****
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
| DATABASE_NAME | ORDINAL | HOST   | PORT | ROLE   | LOCKED | NODE_ID | IS_OFFLINE | SYNC_DESIRED | STATE | TERM | INSTANCE_ID |
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
| test          |       3 | leaf-8 | 3306 | Master | 0      | 8       | 0          | 0            | async | 5    | 4           |
| test          |       3 | leaf-4 | 3306 | Slave  | 0      | 9       | 0          | 0            | async | 0    | 20          |
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
2 rows in set (0.00 sec)

After leaf-8 dies,

-- leaf-8 has entered the offline state.
SHOW LEAVES;
****
+--------+-------+--------------------+-----------+-----------+---------+--------------------+---------------------------+
| Host   | Port  | Availability_Group | Pair_Host | Pair_Port | State   | Opened_Connections | Average_Roundtrip_Latency |
+--------+-------+--------------------+-----------+-----------+---------+--------------------+---------------------------+
| leaf-1 |  3306 |                  1 | leaf-5    |      3306 | online  |                  4 |                     0.404 |
| leaf-2 |  3306 |                  1 | leaf-6    |      3306 | online  |                  4 |                     0.323 |
| leaf-3 |  3306 |                  1 | leaf-7    |      3306 | online  |                  4 |                     0.327 |
| leaf-4 |  3306 |                  1 | leaf-8    |      3306 | online  |                  4 |                     0.295 |
| leaf-5 |  3306 |                  2 | leaf-1    |      3306 | online  |                  4 |                     0.311 |
| leaf-6 |  3306 |                  2 | leaf-2    |      3306 | online  |                  4 |                     0.327 |
| leaf-7 |  3306 |                  2 | leaf-3    |      3306 | online  |                  4 |                     0.323 |
| leaf-8 |  3306 |                  2 | leaf-4    |      3306 | offline |                  1 |                      NULL |
+--------+-------+--------------------+-----------+-----------+---------+--------------------+---------------------------+
8 rows in set (0.00 sec)
-- leaf-4 now owns the master partition for test:3.
SELECT * FROM INFORMATION_SCHEMA.DISTRIBUTED_PARTITIONS
    ->    WHERE DATABASE_NAME='test' AND ORDINAL=3;
****
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
| DATABASE_NAME | ORDINAL | HOST   | PORT | ROLE   | LOCKED | NODE_ID | IS_OFFLINE | SYNC_DESIRED | STATE | TERM | INSTANCE_ID |
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
| test          |       3 | leaf-8 | 3306 | Slave  | 0      | 8       | 1          | 0            | async | 0    | 4           |
| test          |       3 | leaf-4 | 3306 | Master | 0      | 9       | 0          | 0            | async | 18   | 20          |
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
2 rows in set (0.00 sec)
-- The data is still available because of the promotion on leaf-4.
SELECT MIN(id), MAX(id), COUNT(*) FROM test.x;
****
+---------+---------+----------+
| min(id) | max(id) | count(*) |
+---------+---------+----------+
|       1 |     100 |      100 |
+---------+---------+----------+
1 row in set (0.00 sec)
-- We can also write to the database (the ... is an abbreviation, not valid SQL).
INSERT INTO test.x VALUES (101), (102), ... (200);

-- The new count is 200.
SELECT MIN(id), MAX(id), COUNT(*) FROM test.x;
****
+---------+---------+----------+
| MIN(id) | MAX(id) | COUNT(*) |
+---------+---------+----------+
|       1 |     200 |      200 |
+---------+---------+----------+
1 row in set (0.00 sec)

If SingleStore DB is configured to automatically attach leaves that become reachable (by default it is), then after leaf-8 is restarted and recovered, it will automatically become online:

SHOW LEAVES;
****
+--------+-------+--------------------+-----------+-----------+----------+--------------------+---------------------------+
| Host   | Port  | Availability_Group | Pair_Host | Pair_Port | State    | Opened_Connections | Average_Roundtrip_Latency |
+--------+-------+--------------------+-----------+-----------+----------+--------------------+---------------------------+
| leaf-1 |  3306 |                  1 | leaf-5    |      3306 | online   |                  4 |                     0.396 |
| leaf-2 |  3306 |                  1 | leaf-6    |      3306 | online   |                  4 |                     0.323 |
| leaf-3 |  3306 |                  1 | leaf-7    |      3306 | online   |                  4 |                     0.321 |
| leaf-4 |  3306 |                  1 | leaf-8    |      3306 | online   |                  4 |                     0.320 |
| leaf-5 |  3306 |                  2 | leaf-1    |      3306 | online   |                  4 |                     0.319 |
| leaf-6 |  3306 |                  2 | leaf-2    |      3306 | online   |                  4 |                     0.329 |
| leaf-7 |  3306 |                  2 | leaf-3    |      3306 | online   |                  4 |                     0.327 |
| leaf-8 |  3306 |                  2 | leaf-4    |      3306 | online   |                  1 |                     0.318 |
+--------+-------+--------------------+-----------+-----------+----------+--------------------+---------------------------+
8 rows in set (0.00 sec)

And the database will continue to be queryable.

Notice

After leaf-8 becomes visible, for some time it will be in the attaching state. During this period it is replicating new data for the partitions that are stored on leaf-4. As soon as it transitions to the online state, the redundancy is fully restored, however all the master partitions are still on leaf-4, meaning that it does all the work for the data stored on these partitions. Shortly after leaf-8 transitions to online state, half of the replica partitions on it will be promoted to masters, and the load will be evenly distributed again.

You can disable master aggregator from automatically moving leaves that become visible to the online state by running:

set global auto_attach = Off;
Query OK, 0 rows affected (0.00 sec)

In this case if the leaf becomes visible, it transitions to the detached state instead, and one needs to run ATTACH LEAF manually to move it to the online state. ATTACH LEAF examines every partition database on a leaf and tries to reintroduce it into the system.

After leaf-8 is recovered, it still has a database for test:3. This database is in pending state until the leaf is attached. We can examine this by running SHOW DATABASES EXTENDED directly on the leaf:

-- This query is directly against the SingleStore DB instance on leaf-8.
-- Some columns were suppressed in the sample output for clarity.
SHOW DATABASES EXTENDED;
****
+--------------------+---------+-------------+-------------+----------+---------+-------------+------------+
| Database           | Commits | Role        | State       | Position | Details | AsyncSlaves | SyncSlaves |
+--------------------+---------+-------------+-------------+----------+---------+-------------+------------+
| cluster            | 26      | async slave | replicating | 0:53     |         | 0           | 0          |
| information_schema | 66      | master      | online      | 0:181512 |         | 0           | 0          |
| memsql             | 14      | master      | online      | 0:890    |         | 0           | 0          |
| test               | 3       | sync slave  | replicating | 0:433    |         | 0           | 0          |
| test_11            | 5       | master      | pending     | 0:445    |         | 0           | 0          |
| test_15            | 13      | master      | pending     | 0:541    |         | 0           | 0          |
| test_3             | 7       | master      | pending     | 0:469    |         | 0           | 0          |
| test_7             | 8       | master      | pending     | 0:481    |         | 0           | 0          |
+--------------------+---------+-------------+-------------+----------+---------+-------------+------------+
8 rows in set (0.00 sec)

Back on the master aggregator,

-- leaf-8 is alive and is now in the detached state.
SHOW LEAVES;
****
+--------+-------+--------------------+-----------+-----------+----------+--------------------+---------------------------+
| Host   | Port  | Availability_Group | Pair_Host | Pair_Port | State    | Opened_Connections | Average_Roundtrip_Latency |
+--------+-------+--------------------+-----------+-----------+----------+--------------------+---------------------------+
| leaf-1 |  3306 |                  1 | leaf-5    |      3306 | online   |                  4 |                     0.396 |
| leaf-2 |  3306 |                  1 | leaf-6    |      3306 | online   |                  4 |                     0.323 |
| leaf-3 |  3306 |                  1 | leaf-7    |      3306 | online   |                  4 |                     0.321 |
| leaf-4 |  3306 |                  1 | leaf-8    |      3306 | online   |                  4 |                     0.320 |
| leaf-5 |  3306 |                  2 | leaf-1    |      3306 | online   |                  4 |                     0.319 |
| leaf-6 |  3306 |                  2 | leaf-2    |      3306 | online   |                  4 |                     0.329 |
| leaf-7 |  3306 |                  2 | leaf-3    |      3306 | online   |                  4 |                     0.327 |
| leaf-8 |  3306 |                  2 | leaf-4    |      3306 | detached |                  1 |                     0.318 |
+--------+-------+--------------------+-----------+-----------+----------+--------------------+---------------------------+
8 rows in set (0.00 sec)

– ATTACH LEAF will make find test_3 database on the leaf and make it a replica of the existing test_3 database on leaf-4. – A rebalance will automatically run after that and promote test_3 to master on leaf-8, while demoting test_3 to replica on leaf-4.

attach leaf 'leaf-8':3306;
Query OK, 1 row affected (12.92 sec)
-- The master partition for test:3 is once again on leaf-8.
SELECT * FROM INFORMATION_SCHEMA.DISTRIBUTED_PARTITIONS
    ->    WHERE DATABASE_NAME='test' AND ORDINAL=3;
****
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
| DATABASE_NAME | ORDINAL | HOST   | PORT | ROLE   | LOCKED | NODE_ID | IS_OFFLINE | SYNC_DESIRED | STATE | TERM | INSTANCE_ID |
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
| test          |       3 | leaf-8 | 3306 | Master | 0      | 8       | 0          | 0            | async | 20   | 4           |
| test          |       3 | leaf-4 | 3306 | Slave  | 0      | 9       | 0          | 0            | async | 0    | 20          |
+---------------+---------+--------+------+--------+--------+---------+------------+--------------+-------+------+-------------+
2 rows in set (0.00 sec)
-- After attaching leaf-8, the count remains at 200.
SELECT MIN(id), MAX(id), COUNT(*) FROM test.x;
****
+---------+---------+----------+
| MIN(id) | MAX(id) | COUNT(*) |
+---------+---------+----------+
|       1 |     200 |      200 |
+---------+---------+----------+
1 row in set (5.65 sec)

A REBALANCE PARTITIONS walkthrough is left as an exercise for the reader: try adding two new leaves after the redundancy-2 summary and then running EXPLAIN REBALANCE PARTITIONS and REBALANCE PARTITIONS to distribute data onto the new nodes. Use SHOW PARTITIONS and SHOW LEAVES along the way to see how the data distribution changes after you run REBALANCE PARTITIONS.

Shutting Down and Restarting the Cluster

To avoid triggering automatic failure detection while cleanly shutting down the server, shut off and restart SingleStore DB instances in the correct order.

Shutting Down the Cluster

To avoid triggering failover detection:

  1. Shut down the master aggregator first.

  2. Shut down the remaining aggregators and leaves (in any order).

Restarting the Cluster

To bring the cluster back up:

  1. Restart all of the leaves.

  2. Verify that all the leaves are reachable (using a SELECT 1 query).

  3. Turn on the master aggregator.

  4. Turn on the remaining aggregators.

Synchronous vs. Asynchronous High Availability

With high availability redundancy-2 replication, you can specify whether replication from master to replica partitions is Using Synchronous Replication and Synchronous Durability Together. Synchronous replication from the master partitions will complete on all replicas before the commit of the transaction is acknowledged to the client application.

Auto-healing

SingleStore DB handles most leaf failure scenarios automatically so that your workload never has to stop or get interrupted when running with high availability. Some scenarios that are covered by SingleStore DB include:

  • When a leaf fails or disconnects from the cluster, the replica partitions on its pair node automatically get promoted as master partitions and take over the workload

  • When a leaf comes back online, it is automatically attached back to the cluster and the partitions are rebalanced such that the workload is as uniform as possible across all machines

  • When a replica partition has fallen behind or has diverged from its master partition, it is automatically reprovisioned

  • SingleStore DB handles near-OOM and near-out-of-disk situations gracefully through efficient resource management and client-side error reporting

When leaf node recovers in redundancy level 1, it is immediately attached to the cluster. In contrast, attaching a leaf in redundancy level 2 (high availability) can take up to 2 minutes. This is because SingleStore DB optimizes restarting leaf nodes by waiting for a batch of leaves to recover before restoring redundancy. This speeds up the process as rebalancing partitions is an expensive operation. As a result, SHOW PARTITIONS might not immediately show the leaves as online after a restart.

In rare circumstances, it might be necessary to troubleshoot a cluster using low-level clustering commands. For more information, consult Cluster Management Commands.