Availability Zone Failure and the Master Aggregator

Notice

Currently this is a private production preview feature, available by invitation only.

SingleStoreDB Premium is deployed with high availability across cloud availability zones (AZs). There is a single Master Aggregator (MA), and the remaining aggregators are called Child Aggregators (CAs). The aggregators are deployed across three different cloud AZs in order to survive failure of a single AZ, and the system maintains availability of the MA automatically. Each availability group is located in a separate cloud availability zone ensuring that data is resilient to both cloud instance failure and the failure of an entire cloud AZ. If the AZ containing the MA fails, the MA is automatically replaced by a node in a different AZ.

When an MA fails, an election process automatically takes place by which a designated subset of the aggregators called voting members, determine among themselves the node that will be the new MA. Of all the voting members, a group containing a majority of them is guaranteed to have a full copy of all relevant cluster metadata. Elections follow the Raft protocol in order to ensure that agreement is reached on which node will be the new MA even in the event of additional failures of hardware, software, or communications.

An Example to Illustrate Auto MA Failover

Auto-MA-Failover.jpeg

In the above illustration:

  • MA failure in AZ1 is detected.

  • Election takes place between the voting members in AZ2 and AZ3 (quorum).

  • The node in AZ3 is elected as the new MA.

  • Automatic failover to the new MA in AZ3 takes place.

Elections can interrupt any ongoing DDL/reference table DML until completed. Hence any backups, alter table, create table, create database, drop database, attach/detach database, PITR, and reference table write activities will be affected. Reference table reads will continue working. The impact to the workload is the same as during upgrades.

Disruptions are typically resolved within 1-2 minutes. If only one availability zone (AZ) fails, then no data will be lost, and availability is minimally affected. If more than one AZ is lost simultaneously (a very unlikely event), there is a possibility of data loss, and manual intervention may be required both to restore data to the most current available state, and, if necessary, establish a new MA.

In case any applications lose connection because an MA was down, then the user should reconnect to the same address following the same procedure as when the MA is down during upgrade. It is recommended that the applications should have retry logic built-in.

To help users monitor MA, HA and failover activity, the following roles are available in the information_schema.AGGREGATORS and SHOW AGGREGATORS EXTENDED output:

  • Master Aggregator is a special voting member that is the only node in the cluster that can write to the reference databases, and cluster database. This node is responsible for managing cluster metadata, executing cluster operations, and failure detection.

  • Voting Member is a node in the SingleStoreDB cluster that participates in the election process in the event of MA failure.

  • Demoted Voting Member is a voting member that experienced communication failure with the MA, and has stopped participating in replication and elections. Once this node is reachable by the MA again, and has synchronized its copies of the reference databases, it is automatically transitioned to voting member.