Recover from a Master Aggregator Failure
On this page
Prerequisites
Before proceeding, determine if your cluster is running with Master Aggregator High Availability (MA HA) enabled, as the recovery steps differ significantly.
Change Master Aggregator Disk with MA HA Enabled
If your cluster is running with Master Aggregator MA HA enabled, the recovery process is simplified.
-
Update the cluster custom resource with the new storage class in
masterAggregatorSpec..storageClass -
Update the Master Aggregator StatefulSet's
PersistentVolumeClaim(PVC) template with the desired storage class. -
Delete the
PersistentVolumeClaim(PVC) for themaster-0pod. -
Delete the
master-0pod itself. -
The disk will be recreated with the new storage class by the StatefulSet controller.
The promoted child aggregator continues to serve as the active Master during the operation.
Note
SingleStore recommends enabling MA HA to support automatic storage class changes with minimal manual intervention.
Recover from MA Failure without MA HA
If MA HA is not enabled, perform the following steps to recover from a Master Aggregator failure.
-
To prevent the risk of disruptive actions when fixing the cluster, turn off the Operator by setting the replicas field to
0in thesdb-operator.file and apply the change to the cluster.yaml -
Determine which of the child aggregators is furthest ahead in data replication using the following SQL command.
SHOW DATABASE EXTENDEDFrom the output, review the position column for the cluster database on every node and select the child aggregator where:
-
The
positionis highest -
All reference databases are not in an unrecoverable state
-
The
positionfor all databases does not equal0:0
-
-
Set the identified child aggregator as the Master Aggregator using the following SQL command.
AGGREGATOR SET AS MASTER; -
Ensure that the node on
master-0is emptied out.Depending on what occurred when the disk became corrupted, this may require re-emptying the master-0volume.-
If you attempted to start
master-0with an empty volume when the Operator was on, the Operator bootstraps it as a Master Aggregator.This would have resulted in a new single-node cluster that must be emptied out. -
If you did not start
master-0when the Operator was on, then only its storage must be emptied out.When master-0is started as a new empty node, it will not be bootstrapped as the Master Aggregator as the Operator is not running.
-
-
Remove the former Master Aggregator and clear its metadata by running the following SQL command on the temporary Master Aggregator.
REMOVE AGGREGATOR '...-master-0'; -
Optional: If you need to change the Master Aggregator storage class after promoting a child aggregator to master, refer to Change Master Aggregator Disk with MA HA Enabled.
-
Re-add this child aggregator.
This will add the empty node to the cluster as a child aggregator. ADD AGGREGATOR '...-master-0'; -
Promote the child aggregator to turn this node in the master StatefulSet into the new Master Aggregator.
PROMOTE AGGREGATOR '...-master-0' TO MASTER; -
Turn on the Operator by setting the
replicasfield to1in thesdb-operator.file and apply the change to the cluster.yaml
Last modified: