Replication and Durability Concepts

Replication ensures redundancy in a cluster. There are two types of replication:

  • High Availability - replicating partitions between leaf nodes

  • Cluster Replication - replicating partitions between clusters

High availability pairs leaves, and copies all partitions between them. Each leaf consists of half master partitions which ingest the data and respond to queries, and half replica partitions, which replicate the master partitions from the paired leaf. When the master aggregator detects that a leaf has failed, it promotes the replica partitions on the paired leaf so that all of its partitions are in the master state, ingesting data and responding to queries. This can take a few seconds to a few minutes to detect the failure and promote partitions but is ultimately much faster than it is for you to detect an issue with the node and fix it. This ensures the cluster remains online, even if a few leaves fail. However if both leaves in a paired set fail, there will be downtime. You should plan the memory and disk allocation carefully for each leaf, since its partition count and data size will be doubled when high availability is enabled.

Cluster replication involves replicating a database from one cluster to another cluster. The primary cluster will behave normally. The secondary cluster replicates data from the primary cluster and is a read-only cluster, which is useful for running expensive analytical queries so they do not impact your workload on the primary cluster. If there is an issue with the first cluster (for example an AWS region fails) then you can stop replication on the secondary cluster, and direct an application to it instead. This promotes the secondary cluster to primary, allowing you to use it as read/write like in a regular cluster. After stopping replication you cannot start it up again without first dropping the database that was originally replicated. However, you could start replication in the opposite direction at that point. The clusters need not have the same count of nodes but they will both be storing the same amount of data so ensure there are sufficient host resources on each cluster.

An example of Cluster Replication usage is to pair clusters the way leaves are paired in High Availability, as described below. This ensures the workload is split and you only have to stop replication on one database if one cluster fails. Cluster A has database db_a Cluster A replicates database db_b from Cluster B Cluster B has database db_b Cluster B replicates database db_a from Cluster A