High Availability

SingleStore is highly available by default. It ensures high availability (HA) by storing data redundantly in a set of nodes, called availability groups. SingleStore supports two availability groups. Each availability group contains a copy of every partition in the system—some as primaries and some as replicas. As a result, SingleStore has two copies of your data in the system to protect the data against single node failure.

The primary partitions are distributed evenly on nodes across the workspace. The primary partitions on every node in an availability group have their replicas spread evenly among a set of nodes in the opposite availability group. The even distribution of replicas ensures that a failover distributes the additional load from the node failure uniformly across the workspace. As read queries are sent to the primary partitions, balanced load distribution prevents the overloading of a single node with the newly promoted primary partitions.

In the event of a node failure, SingleStore automatically promotes the appropriate replica partitions on the node’s pair into primary partitions, so that the databases remain online. The additional workload from the node failure is spread evenly among multiple other nodes, which contain the replica copies of the primary partitions in the failed node. However, if all of the machines fail, then data will be unavailable until enough machines are recovered or until the workspace is recreated from scratch.

The following diagrams illustrate the partition distribution before and after a workspace failover. In the first diagram, the primary partitions are distributed evenly across nodes. Replica copies of the primary partitions in an availability group are placed evenly across the nodes in the opposite availability group. For example, db_0 has a replica on Node 2, while db_1 has a replica on Node 4.

If Node 1 fails in this setup, SingleStore promotes the replica of db_0 on Node 2 to primary and the replica of db_1 on Node 4 to primary.

When a node comes back online, it will be automatically reintroduced to the workspace. As for the partitions on the node, they are either caught up or rebuilt from scratch.

In SingleStore Helios, one load balancer is set up for the Master Aggregator (MA) and then a second load balancer for the child aggregators (CAs). The load balancer for the CAs does equal distribution of traffic between CAs.

Note: In AWS, the Network Load Balancer (NLB) works better than the classic Elastic Load Balancing (ELB). You can use a single NLB, but different ports for MA and CA targets.

Availability Zone (AZ)

An availability zone (AZ) is a data center with its own power, cooling, and network connections. An AZ is physically separated from other AZs, so local failures, like a fire that destroys a whole AZ, will not affect other AZs. AWS, Azure, and Google Cloud Platform (GCP) all provide multiple AZs in each region where they operate.

Single and Multi-AZ High Availability

High availability is supported in Single-AZ and Multi-AZ configurations, which determine whether SingleStore is deployed in one or multiple cloud availability zones. Multi-AZ failover is supported only in SingleStore Helios.

Single-AZ: By default, SingleStore is deployed with high availability in a single cloud availability zone, providing data redundancy and automatic recovery from cloud instance failures within that zone.
Multi-AZ: When enabled, Multi-AZ deploys SingleStore across multiple cloud availability zones, with each availability group placed in a separate zone. This makes data resilient to both cloud instance failures and the loss of an entire availability zone.

High Availability

On this page

Availability Zone (AZ)

Single and Multi-AZ High Availability

Was this article helpful?

On this page

Was this article helpful?