On this page
Every distributed table has exactly one shard key.
CREATE TABLE (this command creates a columnstore table which is the default table type for SingleStoreDB) or
CREATE ROWSTORE TABLE (this command creates a rowstore table) to create a table, you can specify a shard key for the table.
If the table has a primary key, then the shard key will be whatever the primary key is.
If the table has neither a shard key nor a primary key, then the engine will perform keyless sharding for that table, which means the engine randomly distributes rows of the table across database partitions.
A table’s shard key determines in which partition a given row in the table is stored.
INSERT query, the aggregator node computes a hash function of the values in the column or columns that make up the shard key, which produces the partition number where the row should be stored.
INSERT operation to the appropriate leaf node machine and partition.
DML queries that fully match the shard key can be routed directly to a single partition on a single leaf node server.
Any two rows with the same shard key value are guaranteed to be on the same partition.
SingleStore requires that the set of columns in the
UNIQUE index needs to be a superset of the shard key columns, meaning that the primary / unique key contains all columns in the shard key.
Choosing an appropriate shard key is important for minimizing data skew.
If no shard key is specified, the primary key is used as the shard key.
shard key() with nothing between the parenthesis as follows.
CREATE TABLE t1(a INT, b INT);CREATE TABLE t1(a INT, b INT, SHARD KEY());
In most cases, keyless sharding will result in uniform distribution of rows across partitions.
INSERT … SELECT statement involving sharded tables, in which case, data is inserted locally into the same partition as the source in order to avoid data redistribution.
If you create a table with a primary key and no explicit shard key, the PK will be used as the shard key by default.
CREATE TABLE clicks (click_id BIGINT AUTO_INCREMENT PRIMARY KEY,user_id INT,page_id INT,ts TIMESTAMP);
The explicit syntax for a shard key is
SHARD KEY (column_, where column_
CREATE TABLE clicks (click_id BIGINT AUTO_INCREMENT,user_id INT,page_id INT,ts TIMESTAMP,SHARD KEY (user_id),PRIMARY KEY (click_id, user_id));
In this example, any two clicks by the same user will be guaranteed to be on the same partition.
COUNT(DISTINCT user_ queries, which knows that any two equal
user_ values will never be on different partitions.
Note that even though
click_ will be unique, we still have to include
user_ in the primary key.
See the Optimizing Table Data Structures guide for information on how to choose a shard key.
Data skew occurs when data is unevenly distributed across partitions.
It's important to choose a shard key that minimizes data skew in order to get the fastest query performance.
Refer to Detecting and Resolving Data Skew for more information.
Training: SingleStoreDB Sharding and Shard Keys
Last modified: November 7, 2023