Recover from a Master Aggregator Failure

  1. To prevent the risk of a potentially disruptive action when fixing the cluster, turn the Operator off by setting the replicas field to 0 in the sdb-operator.yaml file and apply the change to the cluster.

  2. Determine which of the child aggregators is the furthest ahead in data replication using the following SQL command.

    SHOW DATABASES EXTENDED

    From the output, review the position column for the cluster database on every node and select the child aggregator where:

    • The position is highest

    • All reference databases are in a non-unrecoverable state

    • The position for all databases does not equal 0:0

    These checks are important as the next step will fail unless they’re performed.

  3. Set the identified child aggregator as the Master Aggregator using the following SQL command.

    AGGREGATOR SET AS MASTER;
  4. Ensure that the node on master-0 is emptied out.

    Depending on what had occurred when the disk was corrupted, this may require re-emptying the master-0 volume.

    • If you attempted to start master-0 with an empty volume when the Operator was on, the Operator would have bootstrapped it as a Master Aggregator. This would have resulted in a new single-node cluster that must be emptied out.

    • If you never restarted master-0 when the Operator was on, then only its storage must be emptied out. When master-0 is started as a new empty node, it will not be bootstrapped as the Master Aggregator as the Operator is not running.

  5. Remove the former Master Aggregator and clear its metadata by running the following SQL command, and the next two SQL commands (Steps 6 and 7) on the temporary Master Aggregator.

    REMOVE AGGREGATOR '...-master-0';
  6. Re-add this child aggregator. This will add the empty node to the cluster as a child aggregator.

    ADD AGGREGATOR '...-master-0';
  7. Promote the child aggregator to turn this node in the master StatefulSet into the new Master Aggregator.

    PROMOTE AGGREGATOR '...-master-0' TO MASTER;
  8. Turn the Operator back on by setting the replicas field to 1 in the sdb-operator.yaml file and apply the change to the cluster.

Last modified: August 31, 2022

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK