Tools: Leaf Node Failures

Replace a Failed Leaf Node in a Redundancy-1 Cluster

Warning

Review the size of your SingleStoreDB data directory and confirm that at least twice the amount of disk space used by the leaf node’s data directory is available. This space is required to hold a copy of the leaf node’s data directory. Another drive, including an external drive, can also be used to hold a copy of the leaf node’s data.

A leaf node’s data typically resides in /var/lib/memsql/<node-directory>/data by default. Note that the <node-directory> in SingleStoreDB v7.3 is a hash, and that you will need to look within this directory to determine if it’s the appropriate node. You can check the ID matches the memsql-id within the node’s data directory (e.g. cat ./data/memsql_id).

This section covers the replacement of a failed leaf node within a redundancy-1 cluster. In this example, it is assumed that data can be recovered from the failed leaf node, which will then be restored.

  1. Determine if the leaf node in question is still running within the cluster. If it is, first stop it, and then remove the leaf role from the node.

    sdb-admin stop-node --memsql-id <node-ID>
    sdb-admin remove-leaf --force --memsql-id <node-ID>
  2. Assuming that the data is recoverable from the failed leaf node, preserve it by changing to the node directory (on the corresponding host) and compressing it to another location, such as the /tmp directory.

    cd /path/to/memsql/<node-directory>
    sudo tar -zcvf /tmp/data.tgz ./data/
  3. As the leaf node’s data has been preserved, delete the leaf node from the cluster.

    sdb-admin delete-node --memsql-id <node-ID> --skip-remove
  4. Create a new node to store this data in. After the node’s been created, stop it so that you can manipulate the node’s data directory.

    sdb-admin create-node --host [IP or hostname OF NODE HOST] --port [PORT OF NODE HOST] --password [PASSWORD OF NODE]
    sdb-admin stop-node --memsql-id <node-ID>
  5. Navigate to the newly created node’s directory and remove the data directory within. Then, extract the data from the /tmp directory to this node’s data directory.

    cd /var/lib/memsql/<new-node-directory>
    sudo rm -r ./data
    sudo tar -zxvf /tmp/data.tgz
  6. Once the extraction is finished, update the ownership and permissions on the directory’s files.

    sudo chown memsql:memsql -R ./data
  7. Start the new node. The MemSQL ID will now be that of the former node, which you can confirm with the following command.

    sdb-admin start-node --all
  8. Run the following SQL command on the Master Aggregator host to add this leaf node to the cluster.

    ADD LEAF <user>:'<password>'@'<node's-host-IP or hostname>':<port>;

Replace a Failed Leaf Node in a Redundancy-2 Cluster

Warning

Review the size of your SingleStoreDB data directory and determine if you have enough disk space remaining when you proceed with the recovery process.

This section details how to recover a failed leaf node in a high availability cluster.

Clusters using Async or Sync replication could potentially present different errors, so knowing which replication option that is being used will be key to recovering a failed cluster.

By default, all MemSQL v7.0/SingleStoreDB 7.1 or later clusters will run with high availability using sync replication. Sync replication ensures that the shared data housed within the primary and secondary leaf nodes is always in sync.

Async replication treats data differently and, as its first priority, allows the cluster to continue to run. While data will eventually be synced between the leaf nodes, it is not a requirement that it happen at the time of the transaction. Refer to Replication and Durability for more information.

Sometimes data can become out of sync when one or more sizable queries are run. If failover occurs during one of these queries, SingleStoreDB will recognize that it’s failing over from a leaf node that contains newer data. As the asynchronous replication of data on the secondary leaf node was not up to date when the failover occurred, there will be an unavoidable amount of data loss.

As a result, the cluster will stop performing transactions and will throw an Asynchronous replication with a FailoverLeaf: Couldn’t failover from node. error. This is intentional, as it affords the cluster administrator to take manual control of the cluster to assess what has occurred.

To acknowledge the potential data loss and manually move the data to the secondary leaf node that is now out of sync, use a SQL editor to run the REBALANCE PARTITIONS command on each database. This will prevent any potential data loss from the leaf nodes being out of sync.

Alert: If REBALANCE PARTITIONS is run before attempting to retain or fix the leaf/partition, the data will be lost and will no longer be recoverable.

REBALANCE PARTITIONS ON db-name;

You may also use the bash script in the One Leaf Node Fails section below to rebalance the partitions on all nodes and restore redundancy on all databases.

One Leaf Node Fails

Reintroduce the Leaf Node

This section details how to reintroduce a failed leaf node in a Redundancy-2 cluster. Reintroducing the leaf node is the simplest solution as SingleStoreDB will automatically reattach the leaf node once it’s back online.

If the failed leaf node was the partition master, those partitions will be promoted to its pair. You may then reintroduce the failed leaf node to the cluster, or add a new leaf node to replace it.

Replace the Leaf Node

This section details how to replace a failed leaf node in a Redundancy-2 cluster with a replacement leaf node from a different host.

If the failed leaf node was the partition master, those partitions will be promoted to its pair. You may then reintroduce the failed leaf node to the cluster, or add a new leaf node to replace it.

  1. If the host of the failed leaf node is still available, note either its availability group, or the availability group of its pair.

  2. If the host that the failed leaf node resides on has also failed, the leaf node must be removed from the cluster. Determine if the failed leaf node is still shown.

    sdb-admin list-nodes

    If so, remove it from the cluster.

    sdb-admin remove-leaf --memsql-id <node-ID>

    If not, skip removing it from the cluster and just remove the failed host.

    sdb-admin delete-node --memsql-id <node-ID> --skip-remove
    sdb-toolbox-config unregister-host --host <host-IP or hostname>
  3. The leaf node may still be visible to the cluster, but Toolbox will no longer recognize it. To confirm, run the following SQL command on the Master Aggregator host and look for the IP and port of the failed leaf node.

    SHOW LEAVES;
  4. If the leaf node persists within the cluster, remove it from the Master Aggregator.

    REMOVE LEAF 'IP OF LEAF':PORT;
  5. Using Toolbox, add a new replacement host to the cluster.

    sdb-toolbox-config register-host --host <host-IP or hostname> -i <SSH-identity-file>
  6. Deploy SingleStoreDB to this host.

    sdb-deploy install --host <host-IP or hostname>
  7. Create a replacement node, assign it a leaf role, and add it to the availability group you noted earlier.

    sdb-admin create-node --host <host-IP or hostname> --password <secure-password> --port <port>
    sdb-admin add-leaf --memsql-id <ID-of-new-node> --availability-group <1 or 2>
  8. Rebalance the partitions on all nodes and restore redundancy on all databases.

    REBALANCE ALL DATABASES;

A Pair of Leaf Nodes Fail

When a pair of leaf nodes fail, their partitions will no longer have any remaining instances, which effectively takes these partitions offline for both reads and writes.

If either of the leaf nodes’ hosts, or a leaf node’s data on a host, is recoverable, a failed leaf node can be reintroduced and reattached to its partitions by following the steps in the Replace a Failed Leaf Node in a Redundancy-1 Cluster section. After following those steps, the partitions will be back online for both reads and writes.

If neither leaf node’s host is recoverable, then data loss has occurred. You must now add replacement leaf nodes and run REBALANCE PARTITIONS ... FORCE to create new (empty) replacement partitions. This can be done by following the steps in the Replace a Leaf Node in a Redundancy-2 Cluster section.

Many Unpaired Leaf Nodes Fail

So long as two paired leaf nodes have not failed, all partitions are still available for reads and writes.

In certain circumstances, all of the leaf nodes in one availability group can fail, but no data loss will be incurred so long as redundancy is restored before another leaf node fails in the remaining availability group.

Many Leaf Nodes Fail, Some of them Paired

When both leaf nodes in a pair fail, every partition that is hosted by these two leaf nodes will be offline to reads and writes.

When one leaf node of a pair fails, the partitions of its pair will remain online for reads and writes.

Offline partitions should be handled using the method detailed in the A Pair of Leaf Nodes Fail section. However, as both leaf nodes are unrecoverable, RESTORE REDUNDANCY or Cluster Downsizing Steps should only be run after all partitions have either been recovered or abandoned as lost data.

Last modified: May 22, 2023

Was this article helpful?