Startup Sequence and Process in a Cluster

Leaf nodes start up and their state becomes "RECOVERING". This can be verified by running SHOW CLUSTER STATUS.
Leaf nodes recover keys and rowstore data into memory. This is the longest part of the process. This can be seen in logs with messages like "replaying snapshot" and "replaying log". As keys are recovered, a progress percentage is included in log messages.
Once all data and keys are recovered into memory, the master aggregator attaches the leaf nodes.
Next, the master aggregator attaches the partitions and auto rebalances them to promote (otherwise called "repoint") partitions.
As partitions are attached, their state becomes "PENDING". This can be verified by running SHOW CLUSTER STATUS.
Next, the partitions will show the state as "TRANSITIONING".
Finally, the partitions will attach and the leaf nodes show their state as "ONLINE".
After the leaf nodes have finished recovering, the aggregators will start up.
Aggregators also recover data and keys into memory but only for internal databases and reference tables, both of which get replicated to all nodes in the cluster from the master aggregator. This is much faster than leaf recovery.
Child aggregators register their ID with the master aggregator.
Aggregators are synced, including syncing auto_increment.
Aggregators show the state as "ONLINE".
Once leaves and then aggregators are online, the cluster becomes available and responsive to queries.

This process can be tracked in the nodes list using the command memsql-admin list-nodes, individual node memsql.log tracelog messages, as well as in the state and replay position of partitions indicated by the SHOW CLUSTER STATUS command.

Important

No user interaction is necessary to bring a cluster online, except to start the nodes. It is not necessary and not recommended to manually rebalance partitions while a node is offline or recovering.

If recovery fails for any reason, the node will indicate a failed recovery in the logs and have show the state "RECOVERY_FAILED" in the node list and may restart. If you see a node in the state "RECOVERY_FAILED" or a node repeatedly restarting or taking much longer to recover than normal, then contact support.

If you need to restart nodes, SingleStore recommends to snapshot all user databases beforehand, to make recovery faster. This truncates transaction logs into the most recent snapshot, so there are fewer lines to replay.

How Auto Rebalance Works on Restart of a Leaf Node

Once a node comes online from the offline/attaching state, the engine waits for two minutes (as per the default value for the attach_rebalance_delay_seconds engine variable) before the rebalance operation is initiated. This is to make sure that the node is stable before the partitions are balanced across the nodes.
Within this timeframe, the stability of the cluster is checked and if any of the nodes fail or transition to an online/offline state, the two minutes counter is reset.

Startup Sequence and Process in a Cluster

On this page

How Auto Rebalance Works on Restart of a Leaf Node

Was this article helpful?

On this page

Was this article helpful?