Leaf Node Recovery Failed Scenario
Scenario: One of the leaves can not replicate data, its status is RECOVERY_
, and the replica partitions are in an unrecoverable state.
If the node is not able to recover, the most common issue is related to memory configuration but not always.
If it is the case of the node not having enough memory to replay data back into memory then increase maximum_
and/or maximum_
to allow recovery to complete.
Troubleshooting Steps:
Try to restart the leaf that is still in the RECOVERY_
state.
To investigate the possible causes you can also check the following:
-
Cluster report
-
Output of
SHOW CLUSTER STATUS
-
Output of
sysctl -a
from the host that has the leaf node with the error.
Other Recommendations:
vm.
should be 100000000 on all nodes.
Ensure open files ulimit
is set to >= 1000000 on all nodes.
If using NUMA nodes the total size of the nodes should be less than the physical memory available on the server.
After making any adjustments to the memory settings or the variables you should try to restart the leaf that is still in the RECOVERY_
state.sdb-admin restart-node
and then select the appropriate leaf.
Last modified: July 20, 2022