Leaf Node Recovery Failed Scenario

Scenario: One of the leaves can not replicate data, its status is RECOVERY_FAILED, and the replica partitions are in an unrecoverable state.

If the node is not able to recover, the most common issue is related to memory configuration but not always. To find out the exact reason the tracelog, memsql.log for that node should be investigated for the error at the time the database became unrecoverable.

If it is the case of the node not having enough memory to replay data back into memory then increase maximum_memory and/or maximum_table_memory to allow recovery to complete. For other configuration or bug related issues contact Support.

Troubleshooting Steps:

Try to restart the leaf that is still in the RECOVERY_FAILED state.

To investigate the possible causes you can also check the following:

  • Cluster report

  • Output of SHOW CLUSTER STATUS

  • Output of sysctl -a from the host that has the leaf node with the error.

Other Recommendations:

vm.max_map_count should be 100000000 on all nodes.

Ensure open files ulimit is set to >= 1000000 on all nodes.

If using NUMA nodes the total size of the nodes should be less than the physical memory available on the server.

After making any adjustments to the memory settings or the variables you should try to restart the leaf that is still in the RECOVERY_FAILED state. You can do this with the command sdb-admin restart-node and then select the appropriate leaf.

Last modified: July 20, 2022

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK