# Porting Tables to SingleStore

## Shard keys

An important decision to make when porting your schema is setting up the appropriate shard keys, which determine how rows are partitioned (see [Sharding](https://docs.singlestore.com/db/v9.1/introduction/distributed-architecture/sharding.md) for more details).

You can choose how to shard a table by adding a `SHARD KEY`. This functions like a normal table index, and can contain any number of columns. For example, the following table is hash partitioned based on the values of `a` and `b` - all rows with the same values for both `a` and `b` will be on the same partition:

```sql
CREATE TABLE t1(a INT, b INT, c INT, SHARD KEY(a, b));

```

If your table has a `PRIMARY` key, SingleStore will by default choose that key as the shard key. For example, both of the following table definitions will shard the table by `a`:

```sql
CREATE TABLE t1(a INT, b INT, PRIMARY KEY(a));

CREATE TABLE t1(a INT, b INT, SHARD KEY(a), PRIMARY KEY(a));

```

If your table does not have a primary key, then SingleStore will by default shard rows uniformly across the partitions. You can also choose to shard your table this way by entering a blank shard key. For example, both of the following will shard the table this way:

```sql
CREATE TABLE t1(a INT, b INT);

CREATE TABLE t1(a INT, b INT, SHARD KEY());

```

If you aren’t sure yet what your workload will look like, these defaults will allow you to continue developing your application and do not impose any functional restrictions. Once you understand how your workload performs in SingleStore better, you can recreate your table with a different chosen shard key.

## Choosing a Shard Key

There are two major factors to take into account when defining a shard key:

* Using a column or set of columns unique enough to minimize skew.
* Sharding on columns which you expect to filter or join on often. This allows the optimizer to minimize network traffic during the execution of the query (see [Distributed DML Query Execution](https://docs.singlestore.com/db/v9.1/introduction/distributed-architecture/distributed-dml-query-execution.md) for more details).

Let’s say we have the following table:

```sql
CREATE TABLE events(event_id INT, user_id INT, time TIMESTAMP);

```

If we aren’t sure yet what our workload will look like, just using the first create table statement without specifying a shard key will shard rows uniformly. This will allow you to get started. It will ensure that there is no skew, but may not yield optimal query performance.

Let’s say we have this query which we would like to be extremely performant:

```sql
SELECT COUNT(DISTINCT first.user_id)
FROM events AS first JOIN events AS second
WHERE first.user_id = second.user_id
    AND TIMESTAMPDIFF(MINUTE, first.time, second.time) < 20;

```

In this case, we’ll want to modify our table schema to shard on `user_id`. When we run our query, we’ll be able to take advantage of locality when performing the join and when computing the distinct aggregate value: The advantage with the join is that it will happen on the leaves without having to repartition or move data between leaves.

```sql
CREATE TABLE events(event_id INT, user_id INT, time TIMESTAMP, SHARD KEY(user_id));

```

***

Modified at: July 29, 2024

Source: [/db/v9.1/developer-resources/migrate-existing-applications/porting-tables-to-singlestore/](https://docs.singlestore.com/db/v9.1/developer-resources/migrate-existing-applications/porting-tables-to-singlestore/)

(An index of the documentation is available at /llms.txt)
