Designing Your Tables
At this point, you should be using SingleStore SQL Editor or Studio, or some other MySQL client to work with your cloud database. Before checking out how fast SingleStore can bring in your data, let’s make sure your tables are designed optimally for your application workload.
By default within SingleStore Managed Service, database tables are created using our Universal Storage format (i.e., disk-based columnstore). There are a few important things to consider when designing tables to house your data:
If you’re familiar with relational databases, you may not need too much guidance on data types.
In addition to data types traditionally supported in relational databases, SingleStore also supports JSON and geospatial data types. Read about DataTypes in SingleStore.
This key determines how data is distributed across the database cluster, and is critical to ensure that your data isn’t skewed. Skewed data can lead to longer query times. Data contained within unique values of your shard key will reside in individual partitions of the database.
- Users of commercial Postgres offerings like Citus will find this familiar to, but not exactly the same as, distribution keys.
- Users of MySQL in AWS RDS or GCP Cloud SQL will also be familiar with the concept of sharding.
- We offer free training on sharding if you’d like to learn more!
So how do I pick a shard key best for my application workload?
- If you have a primary key, make sure the shard key is a subset of it (because cardinality matters!).
- If your application queries include frequent joins or filters on a specific set of columns, make sure the shard key is a subset of those.
- Concurrency is very important with application workloads, so make sure your shard key allows your queries to be single partition, as explained below.
In this example, we use
user_id as our shard key, which works nicely given its high cardinality as a part of this dataset. All records with the same
user_id will be maintained together, which will improve query response time.
CREATE TABLE clicks ( click_id BIGINT AUTO_INCREMENT, user_id INT, page_id INT, ts TIMESTAMP, SHARD KEY (user_id), KEY (click_id, user_id USING CLUSTERED COLUMNSTORE) );
In addition to identifying your shard key, it’s important to tell SingleStore how you would like to sort your data within each data segment. This helps SingleStore enable segment elimination, which ensures a minimal amount of data needs to be read for each query. This also helps SingleStore presort data for your queries.
So how do I pick a columnstore key best for my application workload?
- If you have common filter columns, make sure those are in the columnstore key.
- If you’re inserting in order by some column, it’s best to put that column first in the columnstore key.
- Lower cardinality columns should be first in the columnstore key.
In this example, we use
price as our sort key, so items are sorted in order of that column when queried.
CREATE TABLE products ( ProductId INT, Color VARCHAR(10), Price INT, Qty INT, KEY (`Price`) USING CLUSTERED COLUMNSTORE, SHARD KEY (`ProductId`) );
If you have small, infrequently changing table(s) that are required for joins, consider making them reference tables.
- Reference tables are a convenient way to recreate dimension tables that you may use in MySQL, MariaDB, or PostgreSQL.
- Reference tables are replicated to each leaf in the cluster ensuring data does not need to go across the network between partitions to join data.
- Reference table commands need to be run from the Client Endpoint listed in Portal.