# Recover from a Master Aggregator Failure

## Prerequisites

Before proceeding, determine if your cluster is running with Master Aggregator High Availability (MA HA) enabled, as the recovery steps differ significantly. MA HA provides enhanced reliability and automatic failover capabilities for mission-critical workloads. Refer to [Multi-Datacenter Failover](https://docs.singlestore.com/db/v9.1/user-and-cluster-administration/high-availability-and-disaster-recovery/managing-high-availability/multi-datacenter-failover.md) for more information about MA HA.

## Change Master Aggregator Disk with MA HA Enabled

If your cluster is running with Master Aggregator MA HA enabled, the recovery process is simplified. You only need to change the storage class:

1. Update the cluster custom resource with the new storage class in `masterAggregatorSpec.storageClass`.

2. Update the Master Aggregator StatefulSet's `PersistentVolumeClaim` (PVC) template with the desired storage class.

3. Delete the `PersistentVolumeClaim` (PVC) for the `master-0` pod.

4. Delete the `master-0` pod itself.

5. The disk will be recreated with the new storage class by the StatefulSet controller. The promoted child aggregator continues to serve as the active Master during the operation.

> **📝 Note**: SingleStore recommends enabling MA HA to support automatic storage class changes with minimal manual intervention.

## Recover from MA Failure without MA HA

If MA HA is not enabled, perform the following steps to recover from a Master Aggregator failure.

1. To prevent the risk of disruptive actions when fixing the cluster, turn off the Operator by setting the replicas field to `0` in the `sdb-operator.yaml` file and apply the change to the cluster.

2. Determine which of the child aggregators is furthest ahead in data replication using the following SQL command.
   ```sql
   SHOW DATABASE EXTENDED
   ```
   From the output, review the position column for the cluster database on every node and select the child aggregator where:

   * The `position` is highest
   * All reference databases are not in an unrecoverable state
   * The `position` for all databases does not equal `0:0`

3. Set the identified child aggregator as the Master Aggregator using the following SQL command.
   ```sql
   AGGREGATOR SET AS MASTER;
   ```

4. Ensure that the node on `master-0` is emptied out. Depending on what occurred when the disk became corrupted, this may require re-emptying the `master-0` volume.

   * If you attempted to start `master-0` with an empty volume when the Operator was on, the Operator bootstraps it as a Master Aggregator. This would have resulted in a new single-node cluster that must be emptied out.
   * If you did not start `master-0` when the Operator was on, then only its storage must be emptied out. When `master-0` is started as a new empty node, it will not be bootstrapped as the Master Aggregator as the Operator is not running.

5. Remove the former Master Aggregator and clear its metadata by running the following SQL command on the temporary Master Aggregator.
   ```sql
   REMOVE AGGREGATOR '...-master-0';
   ```

6. Optional: If you need to change the Master Aggregator storage class after promoting a child aggregator to master, refer to [Change Master Aggregator Disk with MA HA Enabled](https://docs.singlestore.com/#section-id235611122385572.md).

7. Re-add this child aggregator. This will add the empty node to the cluster as a child aggregator.
   ```sql
   ADD AGGREGATOR '...-master-0';
   ```

8. Promote the child aggregator to turn this node in the master StatefulSet into the new Master Aggregator.
   ```sql
   PROMOTE AGGREGATOR '...-master-0' TO MASTER;
   ```

9. Turn on the Operator by setting the `replicas` field to `1` in the `sdb-operator.yaml` file and apply the change to the cluster.

***

Modified at: June 11, 2026

Source: [/db/v9.1/reference/singlestore-operator-reference/recover-from-a-master-aggregator-failure/](https://docs.singlestore.com/db/v9.1/reference/singlestore-operator-reference/recover-from-a-master-aggregator-failure/)

(An index of the documentation is available at /llms.txt)
