Enabling High Availability

The leaf_failover_fanout global variable defines the placement of replica partitions among leaves in a cluster. It has two modes: paired and load_balanced. By default, leaf_failover_fanout is set to paired mode. In paired mode, the auto-rebalance operation only affects the newly attached leaves and their pairs. In load_balanced mode, the auto-rebalance operation runs a full rebalance of all the databases on all the leaves in a cluster.

For small clusters, such as a cluster with four leaf nodes, the paired mode is preferable over the load_balanced mode. In the load_balanced mode on a very small cluster, losing two nodes in different availability groups (AGs) will always take the cluster offline. The reason is that there are only two nodes per AG, so to balance the failover load each node in AG1 balances its secondary partitions across both nodes in AG2, and vice versa.

Consider the following example:

AG1 has nodes A and B.

Node A has primary partitions 1,2,3,4.

Node B has primary partitions 5,6,7,8.

AG2 has nodes C and D.

Node C has primary partitions 9,10,11,12.

Node D has primary partitions 13,14,15,16.

In the load_balanced mode, the secondary/backup partitions are balanced as follows:

Node A has backups of 9,10 (from Node C) and 13,14 (from Node D).

Node B has backups of 11,12 (from Node C) and 15,16 (from Node D).

Node C has backups of 1,2 (from Node A) and 5,6 (from Node B).

Node D has backups of 3,4 (from Node A) and 7,8 (from Node B).

If node A goes down, the cluster now requires both nodes C and D to continue serving its partitions, so removal of either of those nodes will take the cluster offline.

If you were in the paired mode, you will instead have something like the following:

Node A has backups of all of Node C's partitions.

Node B has backups of all of Node D's partitions.

Node C has backups of of Node A's partitions.

Node D has backups of all of Node B's partitions.

In this case, if node A goes down, then you can also lose node D (but not Node C) in the other availability group without causing an outage. In paired mode, you should also be able to lose just A and B. The only 2-node outages that will create a cluster outage in this case is losing A and C together, or losing B and D together. The trade-off is that the failover load is not as well balanced when one node goes down, since all of its partitions are now served by a single paired leaf in the other AG.

This is less of a concern on larger clusters, in which the backup partitions are still load balanced across a different AG but don't require every single node in it as in the case of small clusters of just four leaf nodes. This indicates the general tradeoff between load_balanced and paired modes - the former balances the backup partition load more evenly, but also makes it more fragile in the event of an outage.

The following sections discuss how to enable high availability for each of these modes.

Last modified: June 13, 2024

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK