Enabling High Availability

The leaf_failover_fanout global variable defines the placement of replica partitions among leaves in a cluster. It has two modes: paired and load_balanced. By default, leaf_failover_fanout is set to paired mode. In paired mode, the auto-rebalance operation only affects the newly attached leaves and their pairs. In load_balanced mode, the auto-rebalance operation runs a full rebalance of all the databases on all the leaves in a cluster.

For small clusters, such as a cluster with four leaf nodes, the paired mode is preferable over the load_balanced mode. In the load_balanced mode on a very small cluster, losing two nodes in different availability groups (AGs) will always take the cluster offline. The reason is that there are only two nodes per AG, so to balance the failover load each node in AG1 balances its secondary partitions across both nodes in AG2, and vice versa.

Consider the following example:

AG1 has nodes A and B.

Node A has primary partitions 1,2,3,4.

Node B has primary partitions 5,6,7,8.

AG2 has nodes C and D.

Node C has primary partitions 9,10,11,12.

Node D has primary partitions 13,14,15,16.

In the load_balanced mode, the secondary/backup partitions are balanced as follows:

Node A has backups of 9,10 (from Node C) and 13,14 (from Node D).

Node B has backups of 11,12 (from Node C) and 15,16 (from Node D).

Node C has backups of 1,2 (from Node A) and 5,6 (from Node B).

Node D has backups of 3,4 (from Node A) and 7,8 (from Node B).

If node A goes down, the cluster now requires both nodes C and D to continue serving its partitions, so removal of either of those nodes will take the cluster offline.

If you were in the paired mode, you will instead have something like the following:

Node A has backups of all of Node C's partitions.

Node B has backups of all of Node D's partitions.

Node C has backups of of Node A's partitions.

Node D has backups of all of Node B's partitions.

In this case, if node A goes down, then you can also lose node D (but not Node C) in the other availability group without causing an outage. In paired mode, you should also be able to lose just A and B. The only 2-node outages that will create a cluster outage in this case is losing A and C together, or losing B and D together. The trade-off is that the failover load is not as well balanced when one node goes down, since all of its partitions are now served by a single paired leaf in the other AG.

This is less of a concern on larger clusters, in which the backup partitions are still load balanced across a different AG but don't require every single node in it as in the case of small clusters of just four leaf nodes. This indicates the general tradeoff between load_balanced and paired modes - the former balances the backup partition load more evenly, but also makes it more fragile in the event of an outage.

The following sections discuss how to enable high availability for each of these modes.

Last modified: March 8, 2024

Was this article helpful?