SingleStore DB

Types of Shard Keys
Default Shard Key

If no shard key is specified, the primary key is used as the shard key. If no primary key or shard key is specified for a given table, the shard key defaults to being keylessly sharded. To accomplish this, you can either specify no keys, or use shard key() as follows. This also applies to columnstore tables that don’t specify a shard key.

CREATE TABLE t1(a INT, b INT);
CREATE TABLE t1(a INT, b INT, SHARD KEY());

In most cases, keyless sharding will result in uniform distribution of rows across partitions. Cases involving INSERT … SELECT statements are the exception to this.

Primary Key as the Shard Key

If you create a table with a primary key and no explicit shard key, the PK will be used as the shard key by default.

    CREATE TABLE clicks (
        click_id BIGINT AUTO_INCREMENT PRIMARY KEY,
        user_id INT,
        page_id INT,
        ts TIMESTAMP
    );
Non-Unique Shard Key

The syntax for a non-unique shard key is SHARD KEY (col1, col2, ...). For example:

    CREATE TABLE clicks (
        click_id BIGINT AUTO_INCREMENT,
        user_id INT,
        page_id INT,
        ts TIMESTAMP,
        SHARD KEY (user_id),
        PRIMARY KEY (click_id, user_id)
    );

In this example, any two clicks by the same user will be guaranteed to be on the same partition. You can take advantage of this property in query execution for efficient COUNT(DISTINCT user_id) queries, which knows that any two equal (non-distinct) user_id values will never be on different partitions.

Note that even though click_id will be unique, we still have to include user_id in the primary key.