Recover from a Master Aggregator Failure

Prerequisites

Before proceeding, determine if your cluster is running with Master Aggregator High Availability (MA HA) enabled, as the recovery steps differ significantly. MA HA provides enhanced reliability and automatic failover capabilities for mission-critical workloads. Refer to Multi-Datacenter Failover for more information about MA HA.

Change Master Aggregator Disk with MA HA Enabled

If your cluster is running with Master Aggregator MA HA enabled, the recovery process is simplified. You only need to change the storage class:

  1. Update the cluster custom resource with the new storage class in masterAggregatorSpec.storageClass.

  2. Update the Master Aggregator StatefulSet's PersistentVolumeClaim (PVC) template with the desired storage class.

  3. Delete the PersistentVolumeClaim (PVC) for the master-0 pod.

  4. Delete the master-0 pod itself.

  5. The disk will be recreated with the new storage class by the StatefulSet controller. The promoted child aggregator continues to serve as the active Master during the operation.

Note

SingleStore recommends enabling MA HA to support automatic storage class changes with minimal manual intervention.

Recover from MA Failure without MA HA

If MA HA is not enabled, perform the following steps to recover from a Master Aggregator failure.

  1. To prevent the risk of disruptive actions when fixing the cluster, turn off the Operator by setting the replicas field to 0 in the sdb-operator.yaml file and apply the change to the cluster.

  2. Determine which of the child aggregators is furthest ahead in data replication using the following SQL command.

    SHOW DATABASE EXTENDED

    From the output, review the position column for the cluster database on every node and select the child aggregator where:

    • The position is highest

    • All reference databases are not in an unrecoverable state

    • The position for all databases does not equal 0:0

  3. Set the identified child aggregator as the Master Aggregator using the following SQL command.

    AGGREGATOR SET AS MASTER;
  4. Ensure that the node on master-0 is emptied out. Depending on what occurred when the disk became corrupted, this may require re-emptying the master-0 volume.

    • If you attempted to start master-0 with an empty volume when the Operator was on, the Operator bootstraps it as a Master Aggregator. This would have resulted in a new single-node cluster that must be emptied out.

    • If you did not start master-0 when the Operator was on, then only its storage must be emptied out. When master-0 is started as a new empty node, it will not be bootstrapped as the Master Aggregator as the Operator is not running.

  5. Remove the former Master Aggregator and clear its metadata by running the following SQL command on the temporary Master Aggregator.

    REMOVE AGGREGATOR '...-master-0';
  6. Optional: If you need to change the Master Aggregator storage class after promoting a child aggregator to master, refer to Change Master Aggregator Disk with MA HA Enabled.

  7. Re-add this child aggregator. This will add the empty node to the cluster as a child aggregator.

    ADD AGGREGATOR '...-master-0';
  8. Promote the child aggregator to turn this node in the master StatefulSet into the new Master Aggregator.

    PROMOTE AGGREGATOR '...-master-0' TO MASTER;
  9. Turn on the Operator by setting the replicas field to 1 in the sdb-operator.yaml file and apply the change to the cluster.

Last modified:

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK

Try Out This Notebook to See What’s Possible in SingleStore

Get access to other groundbreaking datasets and engage with our community for expert advice.