Cluster Failover

If your primary cluster fails and you want to failover to a database on the secondary cluster, run the STOP REPLICATING db_name command on the secondary cluster master aggregator. This command promotes the database on the secondary cluster to the primary database and the promoted database becomes available for reads and writes (Data Definition Language DDL and Data Manipulation Language DML). Re-point your application at the master aggregator in the cluster where the promoted database resides. If the previous primary database had running pipelines, these pipelines will be started automatically in the promoted database. The pipelines in the promoted database will automatically assume the state of the pipelines in the previous primary database.

Note

If your primary cluster fails, and you want to failover to all databases in the secondary cluster, run STOP REPLICATING once for each database.

After running STOP REPLICATING, you cannot resume replicating from the primary cluster.

Failing Back to Your Primary Cluster

Note

The steps below are only for failing over between two clusters. To resume replication without failing over, you must first use PAUSE REPLICATING instead of STOP REPLICATION and then you should use CONTINUE REPLICATING.

The following scenario explains the recommended approach for failing back to your primary cluster; use this approach if your primary cluster fails and you wish to recover it in minimal time.

You are running cluster A, which contains two databases, ExampleDB1 and ExampleDB2. Database ExampleDB1 has a running pipeline ExamplePipeline1. Database ExampleDB2 has a running pipeline ExamplePipeline2. The hostname and port of the master aggregator on cluster A is ClusterA-ma:3306. You’ve set up Cluster B to replicate the two databases from cluster A using the following commands:

REPLICATE DATABASE ExampleDB1 FROM root@ClusterA-ma:3306; REPLICATE DATABASE ExampleDB2 FROM root@ClusterA-ma:3306;

The hostname and port of the master aggregator on cluster B is ClusterB-ma:3306.

Your application App1 uses ExampleDB1 and ExampleDB2 on Cluster A.

Then, cluster A fails and you take it offline.

To restore Cluster A as the primary cluster, you follow the steps below in order. In these steps, writes refer to write operations initiated using Data Definition Language DDL and Data Manipulation Language DML commands.

Step	Cluster A	Cluster B
1		You run the commands `STOP REPLICATING ExampleDB1;` and `STOP REPLICATING ExampleDB2;`. After you run these commands, `ExampleDB1` and `ExampleDB2` are promoted to primary databases and are available for read and writes. After the two databases are promoted, the pipelines `ExamplePipeline1` and `ExamplePipeline2` are started automatically. These pipelines automatically assume the state of the pipelines in the previous primary databases on cluster A.
2		You point `App1` to cluster B and `App1` writes transactions to `ExampleDB1` and `ExampleDB2`.
3	You resolve the issue that caused cluster A to go offline. You bring cluster A back online.
4	You run the command `REPLICATE DATABASE ExampleDB1 WITH FORCE DIFFERENTIAL FROM root@ClusterB-ma:3306;`. This command replicates, to cluster A, only the contents of `ExampleDB1` on cluster B that are not already in `ExampleDB1` on cluster A. You run the same `REPLICATE DATABASE` command using `ExampleDB2`. See the REPLICATE DATABASE topic for more information.
5		`App1` writes transactions to `ExampleDB1` and `ExampleDB2`.
6	The `REPLICATE DATABASE` command you ran on the two databases in step 4 returns, indicating that cluster A is up-to-date with cluster B as of right before step 4. Following this, the transactions written during step 5 begin replicating to cluster A.
7		You run FLUSH TABLES WITH READ ONLY; this command completes any in-progress write transactions that are running on the cluster and fails any new writes that are initiated before step 10. As an alternative to running `FLUSH TABLES WITH READ ONLY`, you could pause all writes from `App1` and ensure that any in-process write transactions have been completed. Stop the pipelines `ExamplePipeline1` and `ExamplePipeline2`.
8	You determine when the writes made during step 5 have completed syncing to cluster A. Then you run `STOP REPLICATING ExampleDB1, and` `STOP REPLICATING ExampleDB2;`. `ExampleDB1` and `ExampleDB2` become primary databases on cluster A.
9	You start the pipelines `ExamplePipeline1` and `ExamplePipeline2`. These pipelines automatically assume the state of the pipelines in the previous primary databases on cluster B. You repoint `App1` to cluster A and `App1` writes transactions to `Example1` and `Example2`.
10		You run UNLOCK TABLES to enable `ExampleDB1` and `ExampleDB2` for write operations. Note that any writes, made in cluster B following the run of `FLUSH TABLE WITH READ ONLY` in step 7 and before `UNLOCK TABLES`, will fail.
11		You run the command `REPLICATE DATABASE ExampleDB1 WITH FORCE DIFFERENTIAL FROM root@ClusterA-ma:3306;`. This command initiates replication of `ExampleDB1` from cluster A to cluster B. Only writes that were initiated after running `STOP REPLICATING ExampleDB1` in step 8 are replicated to cluster B since `ExampleDB1` on both clusters contain the same contents prior to that time. You run the same `REPLICATE DATABASE` command using `ExampleDB2`. After `REPLICATE DATABASE` returns, replication continues; any new writes made to `ExampleDB1` and `ExampleDB2`on cluster A are replicated to cluster B.

Cluster Failover

On this page

Failing Back to Your Primary Cluster

Was this article helpful?

On this page

Was this article helpful?