Understanding Sort Key Selection
On this page
Sort Key
The sort key is an index that groups rows of columnstore tables into logical segments, where each segment contains data for many rows.
To specify a sort key, use SORT KEY
.SORT KEY(<index_
) or descending (SORT KEY(<index_
).SORT KEY()
.
A table scan using a sort key scans the table in the order of the sort key.
The following example creates a table, t1
, with a sort key on col1
in descending order.DESC
) matches the sort order of the SORT KEY
, and the table is scanned using the SORT KEY
.ASC
) does not match the sort order of the SORT KEY
, and the SORT KEY
is not used in the table scan.
CREATE TABLE t1(col1 int, SORT KEY(col1 DESC));EXPLAIN SELECT * FROM t1 ORDER BY col1 DESC;
+------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.col1 DESC] partitions:all alias:remote_0 |
| Project [t1.col1] |
| OrderedColumnStoreScan test1.t1, KEY col1 (col1 DESC) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore |
+------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
EXPLAIN SELECT * FROM t1 ORDER BY col1;
+-----------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.col1] partitions:all alias:remote_0 |
| Project [t1.col1] |
| Sort [t1.col1] |
| ColumnStoreScan test1.t1, KEY col1 (col1 DESC) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore |
+-----------------------------------------------------------------------------------------------------------+
4 rows in set (0.01 sec)
The following example creates a table with an empty sort key.
CREATE TABLE t1(col1 int, SORT KEY( ));EXPLAIN SELECT * FROM t1;
+---------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0 |
| Project [t1.col1] |
| ColumnStoreScan test8.t1, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore |
+---------------------------------------------------------------------------------------------------------+
Note
KEY() USING CLUSTERED COLUMNSTORE
is a legacy syntax that is equivalent to SORT KEY()
.SORT KEY()
.
SingleStore recommends defining a shard key to explicitly control the data distribution.id
.
CREATE TABLE people (id INT AUTO_INCREMENT,user VARCHAR(24),first VARCHAR(24),last VARCHAR(24),SORT KEY (user),SHARD KEY (id));
Setting the shard key and the sort key on the same column improves data compression.
Refer to Managing Columnstore Segments for additional information about columnstore segments.
Segment Elimination
The single most important consideration for columnstore tables is setting a sort key.
For queries that filter on the sort key column of the columnstore index, segment elimination is typically very efficient as row segments within each row segment group will not have overlapping segments.
The segment size (number of rows per segment) also impacts query execution with respect to the sort key.
Indexing further cuts down on the number of rows scanned when a select query with a WHERE
clause is executed.WHERE
clause, the engine can quickly determine the location of the matching rows position to seek without having to look at all the data.
The figures below show three segments of the people
table.
If a query is searching for a person whose first name starts with an "a", then only the first segment is scanned.user
values are "jm - zz".
Ordered Scans
Ordered scan over a sort key is faster than sorting a table.
Multi-Column Sort Keys
Sort keys can be defined on multiple columns, which can improve performance in certain cases.
Storage
The following examples show how a table is stored with a single-column sort key followed by how the table is stored with a two-column sort key.
The data in a partition is broken into segments based on the sort key order and the segment size.
Create a table named INTEGERS
with a single-column sort key.
CREATE TABLE INTEGERS (A int,B int,SORT KEY (A));
The data in the INTEGERS
table will be sorted on column A
, and then broken into segments, so that segments contain contiguous values of column A
.
The figure below shows how the data is stored with the single-column sort key.
For a multi-column sort key, the data in a partition is also broken into segments based on the sort key order and the segment size.
The command below creates a table named INTEGERS
with a multi-column sort key on columns A
and B
.
CREATE TABLE INTEGERS (A int,B int,SORT KEY (A,B));
The data in this table will be sorted on column A
, and secondarily on column B
, and then broken into segments so each segment contains a contiguous segment of data in that (multi-column) sort order.
The figure below shows how the data is stored with the multi-column sort key.A
and B
, and the segment size is 10, the first three columnstore segments would look as follows.A
, and then secondarily on the values in column B
.
Example
Creating multi-column sort keys on a table can improve query performance.SORT KEY (date, region)
).
Questions to Ask When Choosing Sort Keys
-
Is the data always filtered by some column (e.
g. insert timestamp or event type)? Ensure that the common columns for all queries are in the sort key to improve segment elimination. -
Is the data generally inserted in order by some column (e.
g. insert timestamp)? It’s best to put that column first in the sort key to minimize the amount of work required by the background columnstore segment merger. -
Does one column in your key have a higher cardinality than the other? It’s best to put the lowest cardinality columns first to increase the likelihood that segment elimination will be able to affect later columns.
Related Topics
Last modified: January 28, 2025