Master Aggregator High Availability After Availability Zone Failure
Note
Currently this is a private production preview feature, available by invitation only.
This feature is available only in the Dedicated edition.
When an availability zone (AZ) containing a Master Aggregator (MA) fails, the MA is automatically replaced by a node in a different AZ.
SingleStore Dedicated is deployed with high availability across cloud AZs.
When an MA fails, an election process automatically takes place by which a designated subset of the aggregators, called voting members, determine among themselves the node that will be the new MA.
An Example to Illustrate Auto MA Failover
In the above illustration:
-
MA failure in AZ1 is detected.
-
Election takes place between the voting members in AZ2 and AZ3 (quorum).
-
The node in AZ3 is elected as the new MA.
-
Automatic failover to the new MA in AZ3 takes place.
Elections can interrupt any ongoing DDL/reference table DML until completed.
Disruptions are typically resolved within 1-2 minutes.
In case any applications lose connection because an MA was down, then the user should reconnect to the same address following the same procedure as when the MA is down during upgrade.
To help users monitor MA, HA and failover activity, the following roles are available in the information_
and SHOW AGGREGATORS EXTENDED
output:
-
Master Aggregator is a special voting member that is the only node in the cluster that can write to the reference databases, and cluster database.
This node is responsible for managing cluster metadata, executing cluster operations, and failure detection. -
Voting Member is a node in the SingleStore cluster that participates in the election process in the event of MA failure.
-
Demoted Voting Member is a voting member that experienced communication failure with the MA, and has stopped participating in replication and elections.
Once this node is reachable by the MA again, and has synchronized its copies of the reference databases, it is automatically transitioned to voting member.
Last modified: March 8, 2024