Cluster Failover
If your primary cluster fails and you want to failover to a database on the secondary cluster, run the STOP REPLICATING db_name
command on the secondary cluster master aggregator. This command promotes the database on the secondary cluster to the primary database and the promoted database becomes available for reads and writes (Data Definition Language DDL and Data Manipulation Language DML). Re-point your application at the master aggregator in the cluster where the promoted database resides. If the previous primary database had running pipelines, these pipelines will be started automatically in the promoted database. The pipelines in the promoted database will automatically assume the state of the pipelines in the previous primary database.
Note
If your primary cluster fails, and you want to failover to all databases in the secondary cluster, run STOP REPLICATING
once for each database.
After running STOP REPLICATING
, you cannot resume replicating from the primary cluster.
Failing Back to Your Primary Cluster
The following scenario explains the recommended approach for failing back to your primary cluster; use this approach if your primary cluster fails and you wish to recover it in minimal time.
You are running cluster A, which contains two databases, ExampleDB1
and ExampleDB2
. Database ExampleDB1
has a running pipeline ExamplePipeline1
. Database ExampleDB2
has a running pipeline ExamplePipeline2
. The hostname and port of the master aggregator on cluster A is ClusterA-ma:3306
. You’ve set up Cluster B to replicate the two databases from cluster A using the following commands:
REPLICATE DATABASE ExampleDB1 FROM root@ClusterA-ma:3306;
REPLICATE DATABASE ExampleDB2 FROM root@ClusterA-ma:3306;
The hostname and port of the master aggregator on cluster B is ClusterB-ma:3306
.
Your application App1
uses ExampleDB1
and ExampleDB2
on Cluster A.
Then, cluster A fails and you take it offline.
To restore Cluster A as the primary cluster, you follow the steps below in order. In these steps, “writes” refer to write operations initiated using Data Definition Language DDL and Data Manipulation Language DML commands.
Step | Cluster A | Cluster B |
---|---|---|
1 | You run the commands | |
2 | You point | |
3 | You resolve the issue that caused cluster A to go offline. You bring cluster A back online. | |
4 | You run the command | |
5 |
| |
6 | The | |
7 | You stop the pipelines You run FLUSH TABLES WITH READ ONLY; this command completes any in-progress write transactions that are running on the cluster and fails any new writes that are initiated before step 10. As an alternative to running | |
8 | You determine when the writes made during step 5 have completed syncing to cluster A. Then you run | |
9 | You start the pipelines You repoint | |
10 | You run UNLOCK TABLES to enable | |
11 | You run the command |