# Sharding

Every distributed table has exactly one shard key. A shard key can contain any number of columns. When you run `CREATE TABLE`  (this command creates a columnstore table which is the default table type for SingleStore) or `CREATE ROWSTORE TABLE` (this command creates a rowstore table) to create a table, you can specify a shard key for the table. If you choose not to specify a shard key:

* If the table has a primary key, then the shard key will be whatever the primary key is.
* If the table has neither a shard key nor a primary key, then the engine will perform keyless sharding (A type of database partitioning that divides a database into smaller more easily manageable parts.) for that table, which means the engine randomly distributes rows of the table across database partitions (A partition contains a subset (a shard) of a database’s data. Each partition holds a vertical slice of data, distributed as per a hashing algorithm on the primary key or randomly for keyless sharded databases.).

A table’s shard key determines in which partition a given row in the table is stored. When you run an`INSERT` query, the aggregator node computes a hash function of the values in the column or columns that make up the shard key, which produces the partition number where the row should be stored. The aggregator node then directs the `INSERT` operation to the appropriate leaf node machine and partition.

![SingleStore Sharding](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/bltd3059278c94e2610/6a2c426956043c9e630ac176/sharding-cgpt60.png)

[DML](https://docs.singlestore.com/db/v9.1/introduction/distributed-architecture/distributed-dml-query-execution.md) queries that fully match the shard key can be routed directly to a single partition on a single leaf node server. Group-by queries where the set of keys are guaranteed to not overlap between partitions can be executed in parallel on the leaf node, with the results streamed back without any additional processing on the aggregator node.

Any two rows with the same shard key value are guaranteed to be on the same partition.

SingleStore requires that the set of columns in the `PRIMARY` or `UNIQUE` index needs to be a superset of the shard key columns, meaning that the primary / unique key contains all columns in the shard key. Rows that might have a primary key conflict thus map to the same partition so uniqueness can be enforced locally.

Choosing an appropriate shard key is important for minimizing [data skew](https://docs.singlestore.com/#section-idm4501210432494432654843416521.md).

## Types of Shard Keys

## Default Shard Key

If no shard key is specified, the primary key is used as the shard key. If no primary key or shard key is specified for a given table, the shard key defaults to being keylessly sharded. To accomplish this, you can either specify no keys, or use `shard key()` with nothing between the parenthesis as follows.

```sql
CREATE TABLE t1(a INT, b INT);
CREATE TABLE t1(a INT, b INT, SHARD KEY());

```

In most cases, keyless sharding will result in uniform distribution of rows across partitions. The exception to this is `INSERT … SELECT` statement involving sharded tables, in which case, data is inserted locally into the same partition as the source in order to avoid data redistribution.

## Primary Key as the Shard Key

If you create a table with a primary key and no explicit shard key, the PK will be used as the shard key by default.

```sql
CREATE TABLE clicks (
    click_id BIGINT AUTO_INCREMENT PRIMARY KEY,
    user_id INT,
    page_id INT,
    ts TIMESTAMP
);

```

## Non-Unique Shard Key

The explicit syntax for a shard key is `SHARD KEY (column_list)`, where column\_list is zero or more columns separated by commas. For example:

```sql
CREATE TABLE clicks (
    click_id BIGINT AUTO_INCREMENT,
    user_id INT,
    page_id INT,
    ts TIMESTAMP,
    SHARD KEY (user_id),
    PRIMARY KEY (click_id, user_id)
);

```

In this example, any two clicks by the same user will be guaranteed to be on the same partition. You can take advantage of this property in query execution for efficient `COUNT(DISTINCT user_id)` queries, which knows that any two equal `user_id` values will never be on different partitions.

Note that even though `click_id` will be unique, we still have to include `user_id` in the primary key.

## Choosing a Shard Key

See the [Optimizing Table Data Structures](https://docs.singlestore.com/db/v9.1/create-a-database/optimizing-table-data-structures.md) guide for information on how to choose a shard key.

## Data Skew

Data skew occurs when data is unevenly distributed across partitions.

It's important to choose a shard key that minimizes data skew in order to get the fastest query performance.

Refer to [Detecting and Resolving Data Skew](https://docs.singlestore.com/db/v9.1/create-a-database/detecting-and-resolving-data-skew.md) for more information.

## Related Topics

* Training: [SingleStore Sharding and Shard Keys](https://training.singlestore.com/learn/course/internal/view/elearning/680/sharding-and-shard-keys)
* [Shard Keys](https://docs.singlestore.com/db/v9.1/create-a-database/optimizing-table-data-structures/shard-keys.md)
* [Understanding Shard Key Selection](https://docs.singlestore.com/db/v9.1/create-a-database/understanding-shard-key-selection.md)

***

Modified at: September 16, 2025

Source: [/db/v9.1/introduction/distributed-architecture/sharding/](https://docs.singlestore.com/db/v9.1/introduction/distributed-architecture/sharding/)

(An index of the documentation is available at /llms.txt)
