SingleStore DB

Shard Keys

Every distributed table has exactly one shard key, or shard index. This functions like a normal table index, and can contain any number of columns. When you run CREATE TABLE to create a table, you can specify a shard key for the table. If you don’t specify a shard key:

  • If the table has a primary key, then the shard key will be whatever the primary key is.

  • If the table has neither a shard key nor a primary key, then the engine will perform keyless sharding for that table, which means the engine randomly distributes rows of the table across database partitions.

A table’s shard key determines in which partition a given row in the table is stored. When you run an INSERT query, the aggregator computes the hash value of the values in the column or columns that make up the shard key, does a modulo operation to get a partition number, and directs the INSERT query to the appropriate partition on a leaf machine.


DML queries that fully match the shard key can be routed directly to a single partition on a single leaf server. Group-by queries where the set of keys are guaranteed to not overlap between partitions can be executed in parallel on the leaves, with the results streamed back without any additional processing on the aggregator.

Any two rows with the same shard key value are guaranteed to be on the same partition.

SingleStore requires that the set of columns in the PRIMARY or UNIQUE index needs to be a superset of the shard key columns, meaning that the primary / unique key contains all columns in the shard key. Rows which might have a primary key conflict must map to the same partition in order for uniqueness to be enforced.

Choosing an appropriate shard key is important for minimizing data skew.