Startup Sequence and Process in a Cluster
On this page
Recommended shut down order (if the cluster is up and running):
- 
      First shut down the master aggregator. 
- 
      Next, shut down the child aggregators and then the leaf nodes. 
Recommended startup/restart order:
- 
      First start up the leaf nodes, then the child aggregators. 
- 
      Wait until partitions of all databases in leaves are fully recovered. 
- 
      Start up the master aggregator. 
When the cluster shut down and start up take place in the recommended order, no metadata changes happen, and partitions remain in the same places, with no failover.
The process sequence that happens is as follows:
- 
      Leaf nodes start up and their state becomes "RECOVERING". This can be verified by running SHOW CLUSTER STATUS.
- 
      Leaf nodes recover keys and rowstore data into memory. This is the longest part of the process. This can be seen in logs with messages like "replaying snapshot" and "replaying log". As keys are recovered, a progress percentage is included in log messages. 
- 
      Once all data and keys are recovered into memory, the master aggregator attaches the leaf nodes. 
- 
      Next, the master aggregator attaches the partitions and auto rebalances them to promote (otherwise called "repoint") partitions. 
- 
      As partitions are attached, their state becomes "PENDING". This can be verified by running SHOW CLUSTER STATUS.
- 
      Next, the partitions will show the state as "TRANSITIONING". 
- 
      Finally, the partitions will attach and the leaf nodes show their state as "ONLINE". 
- 
      After the leaf nodes have finished recovering, the aggregators will start up. 
- 
      Aggregators also recover data and keys into memory but only for internal databases and reference tables, both of which get replicated to all nodes in the cluster from the master aggregator. This is much faster than leaf recovery. 
- 
      Child aggregators register their ID with the master aggregator. 
- 
      Aggregators are synced, including syncing auto_ increment. 
- 
      Aggregators show the state as "ONLINE". 
- 
      Once leaves and then aggregators are online, the cluster becomes available and responsive to queries. 
This process can be tracked in the nodes list using the command  memsql-admin list-nodes, individual node memsql.SHOW CLUSTER STATUS command.
Important
No user interaction is necessary to bring a cluster online, except to start the nodes.
If recovery fails for any reason, the node will indicate a failed recovery in the logs and have show the state "RECOVERY_
If you need to restart nodes, SingleStore recommends to snapshot all user databases beforehand, to make recovery faster.
How Auto Rebalance Works on Restart of a Leaf Node
- 
        Once a node comes online from the offline/attaching state, the engine waits for two minutes (as per the default value for the attach_engine variable) before the rebalance operation is initiated.rebalance_ delay_ seconds This is to make sure that the node is stable before the partitions are balanced across the nodes. 
- 
        Within this timeframe, the stability of the cluster is checked and if any of the nodes fail or transition to an online/offline state, the two minutes counter is reset. 
Last modified: March 8, 2024