Recommended Configurations to Tolerate Failure of a Cloud AZ or Nearby Data Center
SingleStore recommends that if you are self-hosting SingleStore and you wish to configure your databases to tolerate failure of an entire cloud availability zone (AZ) within a region, you should create separate clusters in two different AZs in the region and replicate information from one to the other.
SingleStore does not recommend that self-hosted SingleStore customers run individual clusters that span multiple AZs, referred to as Cross-AZ or Multi-AZ deployment.
Technically, it is possible to set up your cluster this way, but it is discouraged for the following reasons:
-
The SingleStore architecture expects low, consistent latency between nodes -- i.
e. it expects them to be on the same local area network. Cross-AZ latencies are usually 1 - 2 milliseconds (ms), but can occasionally, at random, be higher (up to around 100 ms), leading to dropped heart beat messages or other failures in the SingleStore software. In addition, cross-AZ query processing has potential extra latency for query processing of up to an additional 100 ms or so, which can occur at random, with low probability, based on experiments. On the other hand consistent single-digit milliseconds can be obtained with single-AZ deployments. This extra latency in multi-AZ clusters can cause missed SLAs for some applications. -
Setting up SingleStore across AZs is easy to misconfigure in ways that would negate potential ability to recover from AZ failure.
-
If an AZ failure occurs for a cross-AZ cluster, the cluster will be in a degraded state with reduced redundancy and reduced compute capacity, potentially for an extended period of time.
This may lead to (1) higher probability of additional failure that can take the cluster offline and (2) poor response times, during the time the cluster is in this degraded state. -
Traffic across AZs can occur naturally during query processing (e.
g. to process a shuffle operation) and cross-AZ traffic is potentially expensive. It costs $0. 01 per GB on AWS in each direction. A heavy load that shuffles 100MB/sec average cross-AZ traffic would cost the following amount (note that shuffling requires sending and receiving the same amount of data so the $0. 01 is multiplied by two): 2 * 0.
per day01 * 60 * 60 * 24 = $1,728 1,728 * 365 = $630,720
per year
With unlimited storage databases and self-hosted SingleStore, you can also recover from the loss of an AZ in a region with at most about 2 minutes of data loss by attaching the S3 storage for a database to a new cluster on a second AZ in the same region.
If using REPLICATE DATABASE
to get cross-AZ redundancy, use of local storage is required.REPLICATE DATABASE
.
The recommendations given above also apply to SingleStore customers running their workloads in multiple nearby private data centers.
SingleStore Helios does utilize cross-AZ clusters for Premium Edition.
Last modified: February 1, 2023