Creating a Columnstore Table
On this page
The default table type in SingleStore is columnstore.default_
engine variable to rowstore.
Currently, SingleStore supports one sort key per table; however, if you do not want to define a key (e.SORT KEY()
.
Note
KEY() USING CLUSTERED COLUMNSTORE
is a legacy syntax that is equivalent to SORT KEY()
.SORT KEY()
.
Here is an example of a query that creates a columnstore table:
CREATE TABLE products (ProductId INT,Color VARCHAR(10),Price INT,Qty INT,SORT KEY (Price),SHARD KEY (ProductId));
As with any other SingleStore table, we define a SHARD KEY
to explicitly control the data distribution.ProductId
since sharding on a high cardinality identifier column generally allows for a more even distribution and prevents skew.SHARD KEY()
.
Your columnstore table definition can contain metadata-only unenforced unique keys, single-column and multi-column hash keys (which may be UNIQUE or PRIMARY), and a FULLTEXT key.
For more information on creating columnstore tables that use hash keys, see USING HASH behavior.
Besides the keys discussed so far in this section, you cannot add other keys to a columnstore table.
If a sort key is defined on a table, the rows of that table will be stored in a highly compressed columnar format on disk.SORT KEY()
is permitted and results in less expensive background maintenance of the columnstore table, without guaranteeing any particular order.
We describe the details of the format below.
-
Queries that scan several columns out of a table with many columns.
Columnstore table will only materialize columns that it actually needs to perform the query. -
Queries that scan a lot of data in sort key order.
Scans in the columnstore key order are more CPU and memory efficient. -
Joins between two columnstore tables on the sort key.
Such a join can be executed as a merge join, resulting in a good performance and low memory overhead. -
Queries with high selectivity filters on the sort key.
By leveraging a technique called segment elimination
, SingleStore can open only those row segments that actually contain relevant rows, significantly improving the performance of queries with high selectivity.
Columnstore tables are also not constrained by the amount of available memory, unlike rowstore tables.
Creating a Columnstore Table Using Standard CREATE TABLE Syntax
The columnstore as default feature allows you to create a columnstore table using standard CREATE TABLE
syntax.
The Benefit of Using the Columnstore for Transaction Processing
Although rowstores are well-suited for transaction processing, it can be costly to store large datasets in rowstores, as they store all data in RAM.
The columnstore continues to evolve to support transactional workloads that would have traditionally used the rowstore.
Enabling Columnstore as Default
The default_
engine variable specifies whether CREATE TABLE
creates a rowstore or columnstore table, by default.
Note
default_
is set to columnstore
by default.
In SingleStore, to change default_
to columnstore
, set the variable:
SET GLOBAL default_table_type = 'columnstore';
You can create a rowstore table with CREATE ROWSTORE TABLE .
, regardless of the setting of default_
.
Conversion of Standard CREATE TABLE
Syntax to Columnstore CREATE TABLE
Syntax
When the default_
columnstore
, the query engine internally converts standard CREATE TABLE
syntax to equivalent columnstore-specific CREATE TABLE
syntax, as follows:
SORT KEY()
The SORT KEY()
clause is added, if you have not specified the clause explicitly.
CREATE TABLE t(a INT, b INT);
is equivalent to:
CREATE TABLE t(a INT, b INT, SORT KEY());
KEY(. . . )
Single-Column Keys
KEY(<column name>)
is equivalent to KEY(<column name>) USING HASH
.
CREATE TABLE t(a INT, b INT, KEY(a));
is equivalent to:
CREATE TABLE t(a INT, b INT, SORT KEY(), KEY(a) USING HASH);
Multi-Column Keys
KEY(<column 1 name>,<column 2 name>, .
is equivalent to KEY(<column 1 name>,<column 2 name>,.
.
CREATE TABLE t(a INT, b INT, c INT, KEY(a,b));
is equivalent to:
CREATE TABLE t(a INT, b INT, SORT KEY(), KEY(a,b) USING HASH);
Using multi-column unique keys in a columnstore table is supported starting in engine version 7.
A query against t
with an equality filter on a
, an equality filter on b
, or equality filters on both a
and b
could benefit from KEY(a,b) USING HASH
.
Unlike other database products, SingleStore utilizes per-segment inverted indexes and cross-segment indexes to support multi-column keys while minimizing storage costs.
Each secondary index contains a collection of multiple cross-segment indexes, each of which is a hash table covering one or more segments.
These data structures work together to accomplish these fast seeks as follows:
-
The inverted index is built to map values of an index column to a posting list that stores row offsets within the segment with the value searched.
The posting list logs which segment a specific value is located and uses row offsets to determine the value is located. -
The cross-segment indexes map values of each indexed column with the ID of the segment where the searched value is located.
The cross-segment indexes also map to the starting location of the corresponding postings list in the inverted index for each segment. Lastly, for a multi-column index like the example above ( KEY(a,b) USING HASH)
), another collection of cross-segment indexes are built on the row of indexed columns to map from the hash of each tuple (value_
anda value_
) to the starting locations of the corresponding per-segment postings lists forb value_
anda value_
.b
UNIQUE KEY
UNIQUE KEY(<column name>)
is created as UNIQUE KEY(<column name>) USING HASH
.
CREATE TABLE t(a INT, b INT, SHARD(a), UNIQUE KEY(a));
is equivalent to:
CREATE TABLE t(a INT, b INT, SORT KEY(), SHARD(a), UNIQUE KEY(a) USING HASH);
Note
The use of multiple unique keys is not supported in columnstore tables.
PRIMARY KEY
A primary key can be defined for a columnstore table using the PRIMARY KEY(column_
clause or the PRIMARY KEY
keyword in the CREATE TABLE
statement.
CREATE TABLE t(a INT PRIMARY KEY, b INT);
CREATE TABLE t(a INT, b INT, PRIMARY KEY(a));
The index behavior is equivalent to defining a key as UNIQUE KEY `PRIMARY` (a) USING HASH
.
Note: A primary key must contain all of the column(s) of the shard key.
Considerations When Using Hash Keys and Sort Keys
Warning
The previous section explained that in columnstore as default mode, a standard CREATE TABLE
statement using KEY(<k>)
stores <k>
as a hash key in the resulting columnstore table.<sk>
by explicitly specifying SORT KEY(<sk>)
in your CREATE TABLE
statement.
Viewing the Storage Structure of a Table
Note
After creating a columnstore table using standard CREATE TABLE
syntax, run SHOW CREATE TABLE if you wish to see the table’s full, explicit definition of keys and sharding.
Converting an Existing Rowstore Table to a Columnstore Table
You can convert an existing rowstore table to a columnstore table by creating a new columnstore table, copying the data from the rowstore table to the columnstore table, dropping the rowstore table, and renaming the columnstore table to the rowstore table.t
with the columns a INT
and b INT
.default_
is set to columnstore
:
CREATE TABLE t2(a INT, b INT) AS SELECT * FROM t;DROP TABLE t;ALTER TABLE t2 RENAME TO t;
Example Database Schema that is Created with Columnstore as Default
Suppose you want to create a database schema track_
that tracks page visits on a website.myexample.
, the host name is myexample.
and the path is a/b
.
For each page that is visited, you store the date/time of the visit and the user who visited the page.
The schema contains the tables accounts
, users
, webpages
and webpage_
.
Use Case for Using Default as Columnstore to Create the track_ visits
Schema
Analytical Processing
Columnstores (as compared to rowstores) are optimized for analytical processing.
-
How many visits occurred within a particular time period?
-
On average, how many visits are there per user?
-
On average, how many users are there per account?
-
How many unique webpages were clicked?
Transactional Processing
You can utilize the Universal Storage capabilities to perform transaction processing operations on the tables in the schema.
-
To find all of the webpage visits containing a particular hostname, you can run a query such as
SELECT * FROM webpages WHERE url_
.host = 'myexample. com'; This query allows you take advantage of the hash index on url_
, which allows the query to run faster than if the hash index was not used.host -
To find all of the webpages visited on a particular date and time, you can run a query such as
SELECT * FROM webpage_
This query does a highly selective join, as there are only a few records invisits v JOIN webpages w ON v. page_ id = w. id WHERE v. when_ clicked = '2020-01-01 16:35:00'; webpage_
that match the equality filter onvisits when_
.clicked
Standard CREATE TABLE
Syntax for Creating the Tables in the track_ visits
Schema
Following are the CREATE TABLE
statements for the tables in the track_
schema.CREATE TABLE
syntax.
CREATE DATABASE track_visits;USE track_visits;CREATE REFERENCE TABLE accounts(id INT AUTO_INCREMENT,name TEXT,PRIMARY KEY(id),UNIQUE KEY(name));CREATE TABLE users (id BIGINT AUTO_INCREMENT,user_name TEXT,account_id BIGINT,PRIMARY KEY (id));CREATE TABLE webpages(id BIGINT AUTO_INCREMENT,url_host TEXT,url_path TEXT,PRIMARY KEY (id),KEY(url_host));CREATE TABLE webpage_visits(id BIGINT AUTO_INCREMENT,user_id BIGINT,page_id INT,when_clicked DATETIME,PRIMARY KEY (id),KEY (user_id, page_id),KEY (when_clicked));
Storage Structure of Tables in the track_ visits
Schema
The following definitions show the structure of the tables in the track_
schema after they are stored as a result of running the standard CREATE TABLE statements.
Note
INT
, and BIGINT
, which are used in the standard CREATE TABLE
statements, are stored as INT(11)
and BIGINT(20)
, respectively.
In some cases, TEXT
is stored as BLOB
(they are equivalent).
This behavior is independent of the columnstore as default feature.
accounts
CREATE REFERENCE TABLE accounts(id INT NOT NULL AUTO_INCREMENT,name TEXT,UNIQUE KEY `PRIMARY` (id) USING HASH,UNIQUE KEY(name) USING HASH,SORT KEY());
users
CREATE TABLE users (id BIGINT(20) NOT NULL AUTO_INCREMENT,user_name TEXT,account_id BIGINT,SHARD(id),UNIQUE KEY `PRIMARY` (id) USING HASH,SORT KEY());
webpages
CREATE TABLE webpages(id BIGINT NOT NULL AUTO_INCREMENT,url_host BLOB,url_path BLOB,SHARD(id),UNIQUE KEY `PRIMARY` (id) USING HASH,KEY(url_host) USING HASH,SORT KEY());
webpage_ visits
CREATE TABLE webpage_visits(id BIGINT(20) NOT NULL AUTO_INCREMENT,user_id BIGINT(20),page_id INT(11),when_clicked DATETIME,SHARD(id),UNIQUE KEY `PRIMARY` (id) USING HASH,KEY(user_id, page_id) USING HASH,KEY (when_clicked) USING HASH,SORT KEY());
Querying webpage_ visits
Using a Range Filter
If you intend to query webpage_
using a range filter on when_
, you could have made when_
the sort key.CREATE TABLE
syntax.
CREATE TABLE webpage_visits(id BIGINT AUTO_INCREMENT,user_id BIGINT,page_id INT,when_clicked DATETIME,PRIMARY KEY (id),KEY (user_id, page_id),SORT KEY (when_clicked));
The query engine would internally convert the standard CREATE TABLE
syntax to equivalent columnstore-specific CREATE TABLE
syntax, as follows:
CREATE TABLE webpage_visits(id BIGINT AUTO_INCREMENT,user_id BIGINT,page_id INT,when_clicked DATETIME,UNIQUE KEY `PRIMARY` (id) USING HASH,KEY (user_id) USING HASH,KEY (page_id) USING HASH,SORT KEY (when_clicked));
Last modified: November 11, 2024