SingleStore DB

Shard Keys

Every distributed table has exactly one shard key, or shard index. This functions like a normal table index, and can contain any number of columns. When you run CREATE TABLE to create a table, you can specify a shard key for the table. If you don’t specify a shard key, the default shard key is used.

A table’s shard key determines in which partition a given row in the table is stored. When you run an INSERT query, the aggregator computes the hash value of the values in the column or columns that make up the shard key, does a modulo operation to get a partition number, and directs the INSERT query to the appropriate partition on a leaf machine.

16083b575eb475.png

DML queries that fully match the shard key can be routed directly to a single partition on a single leaf server. Group-by queries where the set of keys are guaranteed to not overlap between partitions can be executed in parallel on the leaves, with the results streamed back without any additional processing on the aggregator.

Any two rows with the same shard key value are guaranteed to be on the same partition.

SingleStore DB requires that any PRIMARY or UNIQUE index must be identical to or a superset of the shard key. Rows which might have a primary key conflict must map to the same partition in order for uniqueness to be enforced.

Choosing an appropriate shard key is important for minimizing data skew.