Recommended Configurations to Tolerate Failure of a Cloud AZ or Nearby Data Center
SingleStore recommends that if you are self-hosting SingleStore and you wish to configure your databases to tolerate failure of an entire cloud availability zone (AZ) within a region, you should create separate clusters in two different AZs in the region and replicate information from one to the other.
Important
SingleStore does not support spanning a self-hosted cluster across multiple AZs, typically referred to as a "cross-AZ" or "multi-AZ" deployment.
While it is technically possible to span a cluster across multiple AZs, it is highly discouraged for the following reasons:
-
The SingleStore architecture expects low, consistent latency between nodes -- i.
e. it expects them to be on the same local area network. Cross-AZ latencies are usually 1 - 2 milliseconds (ms), but can occasionally, and at random, be higher (up to around 100 ms), leading to dropped heartbeat messages or other failures in the SingleStore software. In addition, cross-AZ query processing has potential extra latency for query processing of up to an additional 100 ms or so, which can occur at random, with low probability, based on experiments. On the other hand, consistent single-digit milliseconds can be obtained with single-AZ deployments. This extra latency in multi-AZ clusters can cause missed SLAs for some applications. -
Setting up SingleStore across AZs is easy to misconfigure in ways that would negate potential ability to recover from AZ failure.
-
If an AZ failure occurs for a cross-AZ cluster, the cluster will be in a degraded state with reduced redundancy and reduced compute capacity, potentially for an extended period of time.
This may lead to (1) higher probability of additional failure that can take the cluster offline and (2) poor response times, during the time the cluster is in this degraded state. -
Traffic across AZs can occur naturally during query processing (e.
g. to process a shuffle operation) and cross-AZ traffic is potentially expensive. It costs $0. 01 per GB on AWS in each direction. A heavy load that shuffles 100MB/sec average cross-AZ traffic would cost the following amount (note that shuffling requires sending and receiving the same amount of data so the $0. 01 is multiplied by two): 2 * 0.
per day01 * 60 * 60 * 24 = $1,728 1,728 * 365 = $630,720
per year
With unlimited storage databases and self-hosted SingleStore, you can also recover from the loss of an AZ in a region with at most about 2 minutes of data loss by attaching the S3 storage for a database to a new cluster on a second AZ in the same region.
If using REPLICATE DATABASE
to get cross-AZ redundancy, use of local storage is required.REPLICATE DATABASE
.
The recommendations given above also apply to SingleStore customers running their workloads in multiple nearby private data centers.
SingleStore Helios does utilize cross-AZ clusters for Premium Edition.
Last modified: October 29, 2024