Creating a Columnstore Table min read


Use of columnstore format is enabled by adding a sort key to a table, via the CLUSTERED COLUMNSTORE designation or SORT KEY designation. Both designations are equivalent. If a sort key is defined on a table, the table itself is physically stored as a columnstore.

Currently, SingleStore DB supports one columnstore sort key per table; however, if you do not want to define a key (e.g. if records are already inserted in the right order or you do not care about the sort order), you can specify an empty key using KEY() USING CLUSTERED COLUMNSTORE or SORT KEY().

Here is an example of a query that creates a columnstore table:

CREATE TABLE products (
     ProductId INT,
     Color VARCHAR(10),
     Price INT,
     Qty INT,
     KEY (Price) USING CLUSTERED COLUMNSTORE,
     SHARD KEY (ProductId)
);

Here is a CREATE TABLE statement, using the SORT KEY() syntax, that creates an equivalent columnstore table:

CREATE TABLE products (
     ProductId INT,
     Color VARCHAR(10),
     Price INT,
     Qty INT,
     SORT KEY (Price),
     SHARD KEY (ProductId)
);

As with any other SingleStore DB table, we define a SHARD KEY to explicitly control the data distribution. We define it on ProductId since sharding on a high cardinality identifier column generally allows for a more even distribution and prevents skew. It is also possible to randomly distribute data by either omitting the shard key, or defining an empty shard key SHARD KEY(). You can have at most one shard key that is defined on zero or more columns.

Your columnstore table definition can contain metadata-only unenforced unique keys, single-column hash keys (which may be UNIQUE), and a FULLTEXT key. You cannot define more than one unique key.

For more information on creating columnstore tables that use hash keys, see USING HASH Behavior.

Besides the keys discussed so far in this section, you cannot add other keys to a columnstore table.

If a sort key is defined on a table, the rows of that table will be stored in a highly compressed columnar format on disk. Storage is optimized for efficient scan in the key order. Use of an empty key in the form KEY() USING CLUSTERED COLUMNSTORE or SORT KEY() is permitted and results in less expensive background maintenance of the columnstore table, without guaranteeing any particular order.

We describe the details of the format below. While SingleStore DB can execute any query on a columnstore table that it can execute on a rowstore, some queries are more suitable for columnstore tables than others. Some queries that can benefit from using columnstore include:

  • Queries that scan several columns out of a table with many columns. Columnstore table will only materialize columns that it actually needs to perform the query.
  • Queries that scan a lot of data in sort key order. Scans in the columnstore key order are more CPU and memory efficient.
  • Joins between two columnstore tables on the sort key. Such a join can be executed as a merge join, resulting in a good performance and low memory overhead.
  • Queries with high selectivity filters on the sort key. By leveraging a technique called segment elimination, SingleStore DB can open only those row segments that actually contain relevant rows, significantly improving the performance of queries with high selectivity.

Columnstore tables are also not constrained by the amount of available memory, unlike rowstore tables.