Creating a Columnstore Table Using Standard CREATE TABLE Syntax min read


The columnstore as default feature allows you to create a columnstore table using standard CREATE TABLE syntax.

The Benefit of Using the Columnstore for Transaction Processing

Although rowstores are well-suited for transaction processing, it can be costly to store large datasets in rowstores, as they store all data in RAM. Because columnstores are disk-based, it is more economical to store data in columnstores as opposed to rowstores.

The columnstore continues to evolve to support transactional workloads that would have traditionally used the rowstore. At the same time, the columnstore provides the fastest query processing performance for analytical workloads, making it a good choice for a mix of both transactional and analytical requirements.

Enabling Columnstore as Default

The default_table_type engine variable specifies whether CREATE TABLE creates a rowstore or columnstore table, by default.

Info

In SingleStore DB, default_table_type is set to rowstore by default.

In SingleStore Managed Service, default_table_type is set to columnstore for newly created 7.3 clusters only. For existing clusters, the setting remains at rowstore. You cannot change the value of default_table_type in SingleStore Managed Service.

In SingleStore DB, to change default_table_type to columnstore, set the variable:

SET GLOBAL default_table_type = 'columnstore';

You can create a rowstore table with CREATE ROWSTORE TABLE ..., regardless of the setting of default_table_type.

Conversion of Standard CREATE TABLE Syntax to Columnstore CREATE TABLE Syntax

If the default_table_type is columnstore, the query engine internally converts standard CREATE TABLE syntax to equivalent columnstore-specific CREATE TABLE syntax, as follows:

KEY() USING CLUSTERED COLUMNSTORE

The KEY() USING CLUSTERED COLUMNSTORE clause is added, if you have not specified the clause explicitly. For example,

CREATE TABLE t(a INT, b INT);

is equivalent to:

CREATE TABLE t(a INT, b INT, KEY() USING CLUSTERED COLUMNSTORE);

The KEY() USING CLUSTERED COLUMNSTORE clause is not added if you create a rowstore table with CREATE ROWSTORE TABLE ...

KEY(...)

Single-Column Keys

KEY(<column name>) is equivalent to KEY(<column name>) USING HASH. For example,

CREATE TABLE t(a INT, b INT, KEY(a));

is equivalent to:

CREATE TABLE t(a INT, b INT, KEY() USING CLUSTERED COLUMNSTORE, KEY(a) USING HASH);

Multi-Column Keys

KEY(<column 1 name>,<column 2 name>, ... <column n name>) is equivalent to KEY(<column 1 name>) USING HASH, KEY(<column 2 name>) USING HASH, ... KEY(<column n name>) USING HASH. For example,

CREATE TABLE t(a INT, b INT, c INT, KEY(a,b));

is equivalent to:

CREATE TABLE t(a INT, b INT, KEY() USING CLUSTERED COLUMNSTORE, KEY(a) USING HASH, KEY(b) USING HASH);

A query against t with an equality filter on a, an equality filter on b, or equality filters on both a and b could benefit from KEY(a) USING HASH and KEY(b) USING HASH. Depending on the cardinality, the performance of the query may be worse than the performance of the same query, where t is a rowstore table and KEY(a,b) is defined on that table.

UNIQUE KEY

UNIQUE KEY(<column name>) is created as UNIQUE KEY(<column name>) USING HASH. For example,

CREATE TABLE t(a INT, b INT, SHARD(a), UNIQUE KEY(a));

is equivalent to:

CREATE TABLE t(a INT, b INT, KEY() USING CLUSTERED COLUMNSTORE, SHARD(a), UNIQUE KEY(a) USING HASH);

Use of multi-column unique keys is not supported in a columnstore table.

PRIMARY KEY

PRIMARY KEY(<column name>) is created as SHARD(<column name>), UNIQUE KEY(<column name>) USING HASH.

For example, CREATE TABLE t(a INT PRIMARY KEY, b INT); or CREATE TABLE t(a INT, b INT, PRIMARY KEY(a)); is equivalent to:

CREATE TABLE t(a INT NOT NULL, b INT, KEY() USING CLUSTERED COLUMNSTORE, SHARD(a), UNIQUE KEY `PRIMARY` (a) USING HASH);

Considerations When Using Hash Keys and Sort Keys

Warning

The previous section explained that in columnstore as default mode, a standard CREATE TABLE statement using KEY(<k>) stores <k> as a hash key in the resulting columnstore table. Hash keys support fast equality lookups, but not fast range filters. If you want to use a range filter on a columnstore, you need to use a sort key <sk> by explicitly specifying KEY(<sk>) USING CLUSTERED COLUMNSTORE in your CREATE TABLE statement.

Viewing the Storage Structure of a Table

Info

After creating a columnstore table using standard CREATE TABLE syntax, run SHOW CREATE TABLE if you wish to see the table’s full, explicit definition of keys and sharding.

Converting an Existing Rowstore Table to a Columnstore Table

You can convert an existing rowstore table to a columnstore table by creating a new columnstore table, copying the data from the rowstore table to the columnstore table, dropping the rowstore table, and renaming the columnstore table to the rowstore table. For example, suppose you have an existing rowstore table t with the columns a INT and b INT. You can convert this table to a columnstore table as follows, assuming default_table_type is set to columnstore:

CREATE TABLE t2(a INT, b INT) AS SELECT * FROM t;
DROP TABLE t;
ALTER TABLE t2 RENAME TO t;

Example Database Schema that is Created with Columnstore as Default

Suppose you want to create a database schema track_visits that tracks page visits on a website. For each page (URL) that is visited, you store the host name and path of the URL. For example, for the URL myexample.com/a/b/, the host name is myexample.com and the path is a/b.

For each page that is visited, you store the date/time of the visit and the user who visited the page. You also store the account that each user belongs to.

The schema contains the tables accounts, users, webpages and webpage_visits. The definitions for these tables are provided in the standard CREATE TABLE statements section.

Use Case for Using Default as Columnstore to Create the track_visits Schema

Analytical Processing

Columnstores (as compared to rowstores) are optimized for analytical processing. With this schema, you can more quickly answer analytical questions such as:

  • How many visits occurred within a particular time period?
  • On average, how many visits are there per user?
  • On average, how many users are there per account?
  • How many unique webpages were clicked?

Transactional Processing

You can utilize the Universal Storage capabilities to perform transaction processing operations on the tables in the schema. For example:

  • To find all of the webpage visits containing a particular hostname, you can run a query such as SELECT * FROM webpages WHERE url_host = 'myexample.com';. This query allows you take advantage of the hash index on url_host, which allows the query to run faster than if the hash index was not used.

  • To find all of the webpages visited on a particular date and time, you can run a query such as SELECT * FROM webpage_visits v JOIN webpages w ON v.page_id = w.id WHERE v.when_clicked = '2020-01-01 16:35:00'; This query does a highly selective join, as there are only a few records in webpage_visits that match the equality filter on when_clicked.

Standard CREATE TABLE Syntax for Creating the Tables in the track_visits Schema

Following are the CREATE TABLE statements for the tables in the track_visits schema. The statements use standard CREATE TABLE syntax. The statements are run in columnstore as default mode.

CREATE DATABASE track_visits;
USE track_visits;

CREATE REFERENCE TABLE accounts(
	id INT AUTO_INCREMENT,
	name TEXT,
	PRIMARY KEY(id),
	UNIQUE KEY(name)
	);

CREATE TABLE users (
	id BIGINT AUTO_INCREMENT,
	user_name TEXT,
	account_id BIGINT,
	PRIMARY KEY (id)
  );

CREATE TABLE webpages(
	id BIGINT AUTO_INCREMENT,
	url_host TEXT,
	url_path TEXT,
	PRIMARY KEY (id),
	KEY(url_host)
	);

CREATE TABLE webpage_visits(
	id BIGINT AUTO_INCREMENT,
	user_id BIGINT,
	page_id INT,
	when_clicked DATETIME,
	PRIMARY KEY (id),
	KEY (user_id, page_id),
	KEY (when_clicked)
  );

Storage Structure of Tables in the track_visits Schema

The following definitions show the structure of the tables in the track_visits schema after they are stored as a result of running the standard CREATE TABLE statements.

Info

INT, and BIGINT, which are used in the standard CREATE TABLE statements, are stored as INT(11) and BIGINT(20), respectively.

In some cases, TEXT is stored as BLOB (they are equivalent).

This behavior is independent of the columnstore as default feature.

accounts

CREATE REFERENCE TABLE accounts(
	id INT NOT NULL AUTO_INCREMENT,
	name TEXT,
	UNIQUE KEY `PRIMARY` (id) USING HASH,
	UNIQUE KEY(name) USING HASH,
	KEY() USING CLUSTERED COLUMNSTORE
	);

users

CREATE TABLE users (
	id BIGINT(20) NOT NULL AUTO_INCREMENT,
	user_name TEXT,
	account_id BIGINT,
	SHARD(id),
	UNIQUE KEY `PRIMARY` (id) USING HASH,
	KEY() USING CLUSTERED COLUMNSTORE
  );

webpages

CREATE TABLE webpages(
	id BIGINT NOT NULL AUTO_INCREMENT,
	url_host BLOB,
	url_path BLOB,
	SHARD(id),
	UNIQUE KEY `PRIMARY` (id) USING HASH,
	KEY(url_host) USING HASH,
	KEY() USING CLUSTERED COLUMNSTORE
	);

webpage_visits

CREATE TABLE webpage_visits(
	id BIGINT(20) NOT NULL AUTO_INCREMENT,
	user_id BIGINT(20),
	page_id INT(11),
	when_clicked DATETIME,
	SHARD(id),
	UNIQUE KEY `PRIMARY` (id) USING HASH,
	KEY(user_id) USING HASH,
	KEY(page_id) USING HASH,
	KEY (when_clicked) USING HASH,
	KEY() USING CLUSTERED COLUMNSTORE
  );

Querying webpage_visits Using a Range Filter

If you intend to query webpage_visits using a range filter on when_clicked, you could have made when_clicked the sort key. You could have done this using the following standard CREATE TABLE syntax.

CREATE TABLE webpage_visits(
	id BIGINT AUTO_INCREMENT,
	user_id BIGINT,
	page_id INT,
	when_clicked DATETIME,
	PRIMARY KEY (id),
	KEY (user_id, page_id),
	KEY (when_clicked) USING CLUSTERED COLUMNSTORE
  );

The query engine would internally convert the standard CREATE TABLE syntax to equivalent columnstore-specific CREATE TABLE syntax, as follows:

CREATE TABLE webpage_visits(
	id BIGINT AUTO_INCREMENT,
	user_id BIGINT,
	page_id INT,
	when_clicked DATETIME,
	UNIQUE KEY `PRIMARY` (id) USING HASH,
	KEY (user_id) USING HASH,
	KEY (page_id) USING HASH,
	KEY (when_clicked) USING CLUSTERED COLUMNSTORE
  );