High Availability

SingleStore is highly available by default. It ensures high availability by storing data redundantly in a set of nodes, called availability groups. SingleStore supports two availability groups. Each availability group contains a copy of every partition in the system—some as masters and some as replicas. As a result, SingleStore has two copies of your data in the system to protect the data against single node failure.

The master partitions are distributed evenly on nodes across the workspace. The master partitions on every node in an availability group have their replicas spread evenly among a set of nodes in the opposite availability group. The even distribution of replicas ensures that a failover distributes the additional load from the node failure uniformly across the workspace. As read queries are sent to the master partitions, balanced load distribution prevents the overloading of a single node with the newly promoted master partitions.

diagram_load-balanced-partition-placement.png

In the event of a node failure, SingleStore automatically promotes the appropriate replica partitions on the node’s pair into master partitions, so that the databases remain online. The additional workload from the node failure is spread evenly among multiple other nodes, which contain the replica copies of the master partitions in the failed node. However, if all of the machines fail, then data will be unavailable until enough machines are recovered or until the workspace is recreated from scratch.

The following diagrams illustrate the partition distribution before and after a workspace failover. In the first diagram, the master partitions are distributed evenly across nodes. Replica copies of the master partitions in an availability group are placed evenly across the nodes in the opposite availability group. For example, db_0 has a replica on Node 2, while db_1 has a replica on Node 4.

load-balanced-failover.png

If Node 1 fails in this setup, SingleStore promotes the replica of db_0 on Node 2 to master and the replica of db_1 on Node 4 to master.

load-balanced-failed-node.png

When a node comes back online, it will be automatically reintroduced to the workspace. As for the partitions on the node, they are either caught up or rebuilt from scratch.

In SingleStoreDB Cloud, one load balancer is set up for the Master Aggregator (MA) and then a second load balancer for the Child Aggregators (CAs). The load balancer for the CAs does equal distribution of traffic between CAs.

Note: In AWS, the Network Load Balancer (NLB) load balancer works better than the classic Elastic Load Balancing (ELB). You can use a single NLB but different ports for MA and CA targets.

Availability Zone (AZ)

An availability zone (AZ) is a data center with its own power, cooling, and network connections. An AZ is physically separated from other AZs, so local failures, like a fire that destroys a whole AZ, will not affect other AZs. AWS, Azure, and Google Cloud Platform (GCP) all provide multiple AZs in each region where they operate.

Single and Multi AZ High Availability

High Availability is available in 1-AZ and 2-AZ configurations depending on the edition of SingleStore purchased. 2-AZ failover is only supported in SingleStore’s cloud offering.

SingleStore Standard : 1-AZ

SingleStore Standard is deployed with high availability within a single cloud availability zone. This ensures data redundancy within the workspace to automatically recovery from the failure of cloud instances within the availability zone.

SingleStore Premium : 2-AZ

SingleStore Premium is deployed with high availability across two cloud availability zones. Each availability group is located in separate cloud availability zone ensuring that data is resilient to both cloud instance failure and the failure of an entire cloud AZ.