Designing Your Tables
At this point, you should be using SingleStore SQL Editor or Studio, or some other MySQL client to work with your cloud database. Before checking out how fast SingleStore can bring in your data, let’s make sure your tables are designed optimally for your application workload.
By default within SingleStore Managed Service, database tables are created using our Universal Storage format (i.e., disk-based columnstore). There are a few important things to consider when designing tables to house your data:
If you’re familiar with relational databases, you may not need too much guidance on data types.
In addition to data types traditionally supported in relational databases, SingleStore also supports JSON and geospatial data types. Read about DataTypes in SingleStore.
This key determines how data is distributed across the database cluster, and is critical to ensure that your data isn’t skewed. Skewed data can lead to longer query times. Data contained within unique values of your shard key will reside in individual partitions of the database. Well distributed data will ensure scalability and parallelism when fulfilling diverse requests concurrently.
- Snowflake users may be familiar with micro-partitioning and clustering to distribute data. If you’ve perhaps found these difficult to maintain on ever-changing data, you are in the right place.
- Redshift users would be familiar with the various mechanisms listed here for data distribution across a cluster.
- BigQuery users will be familiar with the concept of sharding from “partitioned tables”.
- Synapse users will be familiar with the concept of sharding, though SingleStore Managed Service gives you a bit more flexibility here on data distribution.
So how do I pick a shard key best for my application workload?
- If you have a primary key, make sure the shard key is a subset of it (because cardinality matters!).
- If your BI queries include frequent joins or filters on a specific set of columns, make sure the shard key is a subset of those (in both tables, for joins).
- For highly concurrent workloads, make sure your shard key allows your queries to be single partition (i.e., join or filter columns within shard key).
In this example, we use
user_id as our shard key, which works nicely given its high cardinality as a part of this dataset. All records with the same
user_id will be maintained together, which will improve query response time.
CREATE TABLE clicks ( click_id BIGINT AUTO_INCREMENT, user_id INT, page_id INT, ts TIMESTAMP, SHARD KEY (user_id), KEY (click_id, user_id USING CLUSTERED COLUMNSTORE) );
In addition to identifying your shard key, it’s important to tell SingleStore how you would like to sort your data within each data segment. This helps SingleStore enable segment elimination, which ensures a minimal amount of data needs to be read for each query. This also helps SingleStore presort data for your queries.
If you’re familiar with BigQuery, you may note that this is similar to the concept of clustering.
So how do I pick a columnstore key best for my BI workload?
- BI dashboards often require lots of filtering. If you have common filter columns, make sure those are in the columnstore key
- If you’re inserting in order by some column, it’s best to put that column first in the columnstore key.
- Lower cardinality columns should be first in the columnstore key.
In this example, we use
price as our sort key, so items are sorted in order of that column when queried.
CREATE TABLE products ( ProductId INT, Color VARCHAR(10), Price INT, Qty INT, KEY (`Price`) USING CLUSTERED COLUMNSTORE, SHARD KEY (`ProductId`) );
If you have small, infrequently changing table(s) that are required for joins, consider making them reference tables.
- Reference tables are a convenient way to recreate dimension tables that you may use in MySQL, MariaDB, or PostgreSQL.
- Reference tables are replicated to each leaf in the cluster ensuring data does not need to go across the network between partitions to join data.
- Reference table commands need to be run from the Client Endpoint listed in Portal.