Sharding
On this page
Every distributed table has exactly one shard key.CREATE TABLE
(this command creates a columnstore table which is the default table type for SingleStore) or CREATE ROWSTORE TABLE
(this command creates a rowstore table) to create a table, you can specify a shard key for the table.
-
If the table has a primary key, then the shard key will be whatever the primary key is.
-
If the table has neither a shard key nor a primary key, then the engine will perform keyless sharding for that table, which means the engine randomly distributes rows of the table across database partitions.
A table’s shard key determines in which partition a given row in the table is stored.INSERT
query, the aggregator node computes a hash function of the values in the column or columns that make up the shard key, which produces the partition number where the row should be stored.INSERT
operation to the appropriate leaf node machine and partition.
DML queries that fully match the shard key can be routed directly to a single partition on a single leaf node server.
Any two rows with the same shard key value are guaranteed to be on the same partition.
SingleStore requires that the set of columns in the PRIMARY
or UNIQUE
index needs to be a superset of the shard key columns, meaning that the primary / unique key contains all columns in the shard key.
Choosing an appropriate shard key is important for minimizing data skew.
Types of Shard Keys
Default Shard Key
If no shard key is specified, the primary key is used as the shard key.shard key()
with nothing between the parenthesis as follows.
CREATE TABLE t1(a INT, b INT);CREATE TABLE t1(a INT, b INT, SHARD KEY());
In most cases, keyless sharding will result in uniform distribution of rows across partitions.INSERT … SELECT
statement involving sharded tables, in which case, data is inserted locally into the same partition as the source in order to avoid data redistribution.
Primary Key as the Shard Key
If you create a table with a primary key and no explicit shard key, the PK will be used as the shard key by default.
CREATE TABLE clicks (click_id BIGINT AUTO_INCREMENT PRIMARY KEY,user_id INT,page_id INT,ts TIMESTAMP);
Non-Unique Shard Key
The explicit syntax for a shard key is SHARD KEY (column_
, where column_
CREATE TABLE clicks (click_id BIGINT AUTO_INCREMENT,user_id INT,page_id INT,ts TIMESTAMP,SHARD KEY (user_id),PRIMARY KEY (click_id, user_id));
In this example, any two clicks by the same user will be guaranteed to be on the same partition.COUNT(DISTINCT user_
queries, which knows that any two equal user_
values will never be on different partitions.
Note that even though click_
will be unique, we still have to include user_
in the primary key.
Choosing a Shard Key
See the Optimizing Table Data Structures guide for information on how to choose a shard key.
Data Skew
Data skew occurs when data is unevenly distributed across partitions.
It's important to choose a shard key that minimizes data skew in order to get the fastest query performance.
Refer to Detecting and Resolving Data Skew for more information.
Related Topics
-
Training: SingleStore Sharding and Shard Keys
Last modified: November 7, 2023