Cluster Failover

If your primary cluster fails and you want to failover to a database on the secondary cluster, run the STOP REPLICATING db_name command on the secondary cluster master aggregator. This command promotes the database on the secondary cluster to the primary database and the promoted database becomes available for reads and writes (Data Definition Language DDL and Data Manipulation Language DML). Re-point your application at the master aggregator in the cluster where the promoted database resides. If the previous primary database had running pipelines, these pipelines will be started automatically in the promoted database. The pipelines in the promoted database will automatically assume the state of the pipelines in the previous primary database.

Note

If your primary cluster fails, and you want to failover to all databases in the secondary cluster, run STOP REPLICATING once for each database.

After running STOP REPLICATING, you cannot resume replicating from the primary cluster.

Failing Back to Your Primary Cluster

Note

The steps below are only for failing over between two clusters. To resume replication without failing over, you must first use PAUSE REPLICATING instead of STOP REPLICATION and then you should use CONTINUE REPLICATING.

The following scenario explains the recommended approach for failing back to your primary cluster; use this approach if your primary cluster fails and you wish to recover it in minimal time.

You are running cluster A, which contains two databases, ExampleDB1 and ExampleDB2. Database ExampleDB1 has a running pipeline ExamplePipeline1. Database ExampleDB2 has a running pipeline ExamplePipeline2. The hostname and port of the master aggregator on cluster A is ClusterA-ma:3306. You’ve set up Cluster B to replicate the two databases from cluster A using the following commands:

REPLICATE DATABASE ExampleDB1 FROM root@ClusterA-ma:3306; REPLICATE DATABASE ExampleDB2 FROM root@ClusterA-ma:3306;

The hostname and port of the master aggregator on cluster B is ClusterB-ma:3306.

Your application App1 uses ExampleDB1 and ExampleDB2 on Cluster A.

Then, cluster A fails and you take it offline.

To restore Cluster A as the primary cluster, you follow the steps below in order. In these steps, writes refer to write operations initiated using Data Definition Language DDL and Data Manipulation Language DML commands.

Step

Cluster A

Cluster B

1

You run the commands STOP REPLICATING ExampleDB1; and STOP REPLICATING ExampleDB2;. After you run these commands, ExampleDB1 and ExampleDB2 are promoted to primary databases and are available for read and writes. After the two databases are promoted, the pipelines ExamplePipeline1 and ExamplePipeline2 are started automatically. These pipelines automatically assume the state of the pipelines in the previous primary databases on cluster A.

2

You point App1 to cluster B and App1 writes transactions to ExampleDB1 and ExampleDB2.

3

You resolve the issue that caused cluster A to go offline. You bring cluster A back online.

4

You run the command REPLICATE DATABASE ExampleDB1 WITH FORCE DIFFERENTIAL FROM root@ClusterB-ma:3306;. This command replicates, to cluster A, only the contents of ExampleDB1 on cluster B that are not already in ExampleDB1 on cluster A. You run the same REPLICATE DATABASE command using ExampleDB2. See the REPLICATE DATABASE topic for more information.

5

App1 writes transactions to ExampleDB1 and ExampleDB2.

6

The REPLICATE DATABASE command you ran on the two databases in step 4 returns, indicating that cluster A is up-to-date with cluster B as of right before step 4. Following this, the transactions written during step 5 begin replicating to cluster A.

7

You stop the pipelines ExamplePipeline1 and ExamplePipeline2.

You run FLUSH TABLES WITH READ ONLY; this command completes any in-progress write transactions that are running on the cluster and fails any new writes that are initiated before step 10. As an alternative to running FLUSH TABLES WITH READ ONLY, you could pause all writes from App1 and ensure that any in-process write transactions have been completed.

8

You determine when the writes made during step 5 have completed syncing to cluster A. Then you run STOP REPLICATING ExampleDB1, and STOP REPLICATING ExampleDB2;. ExampleDB1 and ExampleDB2 become primary databases on cluster A.

9

You start the pipelines ExamplePipeline1 and ExamplePipeline2. These pipelines automatically assume the state of the pipelines in the previous primary databases on cluster B.

You repoint App1 to cluster A and App1 writes transactions to Example1 and Example2.

10

You run UNLOCK TABLES to enable ExampleDB1 and ExampleDB2 for write operations. Note that any writes, made in cluster B following the run of FLUSH TABLE WITH READ ONLY in step 7 and before UNLOCK TABLES, will fail.

11

You run the command REPLICATE DATABASE ExampleDB1 WITH FORCE DIFFERENTIAL FROM root@ClusterA-ma:3306;. This command initiates replication of ExampleDB1 from cluster A to cluster B. Only writes that were initiated after running STOP REPLICATING ExampleDB1 in step 8 are replicated to cluster B since ExampleDB1 on both clusters contain the same contents prior to that time. You run the same REPLICATE DATABASE command using ExampleDB2. After REPLICATE DATABASE returns, replication continues; any new writes made to ExampleDB1 and ExampleDB2on cluster A are replicated to cluster B.

Last modified: December 11, 2024

Was this article helpful?