# Clustering

## How does SingleStore shard tables?

Every distributed table (except reference tables, which are replicated in whole on each “leaf” node) has a `SHARD KEY` that specifies which columns of a row to hash to determine what *partition* a row should reside in. When rows are inserted into a sharded table, they are hashed by the table’s shard key and sent to the leaf carrying the corresponding partition. This technique is commonly referred to as hash-based partitioning. You can choose how to shard each table by specifying its `SHARD KEY` as part of the `CREATE TABLE` statement. See [Sharding](https://docs.singlestore.com/db/v9.1/introduction/distributed-architecture/sharding.md) for more details.

## What are aggregator and leaf nodes?

SingleStore stores and computes data on *leaf (A node that stores a subset of a cluster’s data. A leaf node functions as a storage and compute node. To optimize performance, SingleStore automatically distributes data across leaf nodes into partitions. Each leaf node contains several partitions.)* nodes. You can linearly scale both storage and computational power by adding more leaf nodes. Clients query an *aggregator (A node that routes queries to the leaf nodes, aggregates intermediate results, and sends the results back to a client. There are two types of aggregators: master and child.)* node, which in turn queries one or more leaf nodes to collect the rows required to execute the query. Multiple aggregators nodes perform the same functions with respect to executing Data Manipulation Language (DML) queries and allow clients to load-balance queries across the aggregators. Leaf nodes should not be queried directly except for maintenance purposes in exceptional situations.

## What is a “Master Aggregator”?

The Master Aggregator is a specialized aggregator responsible for cluster monitoring and failover.

## What happens if the Master Aggregator crashes?

If the [Master Aggregator](https://docs.singlestore.com/db/v9.1/introduction/distributed-architecture/cluster-components.md) becomes unresponsive, clients can continue to execute DML queries (e.g. `INSERT` and `SELECT`) against the other aggregators, but DDL and clustering operations can not be performed until the master aggregator is revived or another aggregator is “promoted” to be the Master Aggregator.

## How do I add nodes to SingleStore?

Using the new SingleStore management tools, you can add new nodes and assign them roles in your cluster. To see how to do that in a deployment, visit our [SingleStore Tools reference](https://docs.singlestore.com/db/v9.1/reference/singlestore-tools-reference.md).

## How many aggregator and leaf nodes do I need?

SingleStore stores data in leaf nodes, so you need enough leaf nodes to store all your data in memory. If you are replicating data (redundancy level 2), you need twice as many leaf nodes.

The recommended number of aggregators depends on your use case. If, for instance, your cluster is being used for more than one type of workload (for example, it is the backend for a web application and also being queried by analysts), it is probably best to have multiple aggregators, or pools of aggregators, for these separate workloads. Aside from distribution of workload, the most significant factor to consider is network bandwidth. As a rule of thumb, clusters with 50 nodes or fewer should have about a 5:1 leaf to aggregator ratio. Clusters with more than 50 nodes can have closer to a 10:1 leaf to aggregator ratio. Note that you can also add nodes to a cluster to tune performance after it is up and running.

The appropriate ratio of aggregators to leaves also depends on the type of workload running. Transactional workloads that run many small queries or queries that involve only a single partition require more aggregators, since those queries interact with one aggregator and one leaf. Analytical workloads, especially those involving distributed joins, require fewer aggregators because almost all the work is performed on the leaves.

## Can I query the leaf nodes individually?

Yes, but this is not recommended. It only should be done in troubleshooting scenarios.

## After I ran a single host install, why are there 2 SingleStore nodes (or 4 SingleStore processes) running?

This is the “single host cluster” setup. Your machine is running both a master aggregator process and a leaf process, and hence “2 SingleStore Nodes” is shown (each SingleStore instance or *node* contains two processes). Make sure to send your queries to the aggregator.

***

Modified at: July 26, 2024

Source: [/db/v9.1/introduction/faqs/clustering/](https://docs.singlestore.com/db/v9.1/introduction/faqs/clustering/)

(An index of the documentation is available at /llms.txt)
