Recommended Configurations to Tolerate Failure of a Nearby Datacenter

SingleStore recommends that if you are self-hosting SingleStore and wish to configure your databases to tolerate failure of a datacenter within a region, you must create separate clusters in two different datacenters in the region and replicate information from one to the other. This replication can be done using REPLICATE DATABASE or with application logic.

Important

SingleStore does not support spanning a self-hosted cluster across multiple AZs, typically referred to as a "cross-AZ" or "multi-AZ" deployment.

Deploying across datacenters is discouraged for the following reasons:

  • The SingleStore architecture expects consistent, low latency between nodes, i.e., it expects them to be on the same local area network. Cross datacenter latencies are usually 1 - 2 milliseconds (ms), but can occasionally, and at random, be higher (up to around 100 ms), leading to dropped heartbeat messages or other failures in the SingleStore software. In addition, cross datacenter query processing has potential extra latency for query processing of up to an additional 100 ms or so, which can occur at random, with low probability, based on experiments. On the other hand, consistent single-digit milliseconds can be obtained with single datacenter deployments. This extra latency in multi-datacenter clusters can cause missed SLAs for some applications.

  • Setting up SingleStore across datacenters is easy to misconfigure in ways that would negate potential ability to recover from failure.

  • If a datacenter failure occurs for a cross-datacenter cluster, the cluster will be in a degraded state with reduced redundancy and reduced compute capacity, potentially for an extended period of time. This may lead to (1) higher probability of additional failure that can take the cluster offline and (2) poor response times, during the time the cluster is in this degraded state.

  • Traffic across datacenters can occur naturally during query processing (e.g. to process a shuffle operation) and cross-datacenter traffic is potentially expensive.

If using the REPLICATE DATABASE command to get cross-datacenter redundancy, use of local storage is required. Unlimited storage databases do not support REPLICATE DATABASE.The recommendations given above also apply to SingleStore customers running their workloads in multiple nearby private data centers. Nearby data centers with fast network connections between them, in different buildings with independent power and cooling and less than 60 miles apart, are analogous to the cloud availability zones in the above discussion. For private data centers, the issue of charges for cross-data-center network traffic may not apply.

SingleStore Helios does support cross-AZ clusters for Enterprise Edition. However, it has special handling for timeout errors and other issues that can occur due to high but random cross-AZ message latency, via dedicated monitoring and planned recovery procedures. These are not available for self-hosted SingleStore deployments.

Last modified: December 2, 2024

Was this article helpful?