Understanding Sort Key Selection

Sort Key

The sort key is an index that groups rows of columnstore tables into logical segments, where each segment contains data for many rows. It can be set on a column or multiple columns of columnstore tables to create a sort order of values within those columns. Each group can contain a maximum of 1 million rows in a compressed format on disk. These groups are called segment files or blobs. (Columnstore blobs have no connection to the BLOB data type. See BLOB Types for information on the BLOB data type.) These terms are used interchangeably. The segment files contain metadata that holds the minimum and maximum values for each group.

To set a sort key, you use SORT KEY() syntax. If you want to create an unsorted columnstore table, you can specify an empty key using SORT KEY(). An columnstore sort key cannot be altered once the table has been created. See the following example:

CREATE TABLE t1(col1 int, SORT KEY( ));
EXPLAIN SELECT * FROM t1;
+---------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                 |
+---------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0                                                                    |
| Project [t1.col1]                                                                                       |
| ColumnStoreScan test8.t1, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore |
+---------------------------------------------------------------------------------------------------------+

The sort key order can be specified as ascending (SORT KEY(index_column_name)) or descending (SORT KEY(index_column_name DESC)). SingleStore does not support scanning a SORT KEY() in reverse order to its sort order. See the following example:

CREATE TABLE t1(col1 int, SORT KEY(col1 DESC));
EXPLAIN SELECT * FROM t1 ORDER BY col1 DESC;
+------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                          |
+------------------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.col1 DESC] partitions:all alias:remote_0                                                   |
| Project [t1.col1]                                                                                                |
| OrderedColumnStoreScan test1.t1, KEY col1 (col1 DESC) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore |
+------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
EXPLAIN SELECT * FROM t1 ORDER BY col1;
+-----------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                   |
+-----------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.col1] partitions:all alias:remote_0                                                 |
| Project [t1.col1]                                                                                         |
| Sort [t1.col1]                                                                                            |
| ColumnStoreScan test1.t1, KEY col1 (col1 DESC) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore |
+-----------------------------------------------------------------------------------------------------------+
4 rows in set (0.01 sec)

Note

KEY() USING CLUSTERED COLUMNSTORE is a legacy syntax that is equivalent to SORT KEY(). SingleStore recommends using SORT KEY().

As with any other SingleStore table, we suggest you define a shard key to explicitly control the data distribution within the cluster. In the following example, we define the shard key on id  since sharding on a high cardinality identifier column generally allows for more even distribution and prevents skew.

CREATE TABLE people (
id INT AUTO_INCREMENT,
user VARCHAR(24),
first VARCHAR(24),
last VARCHAR(24),
SORT KEY (user),
SHARD KEY (id)
);

Furthermore, setting the shard key and the sort key on the same column improves data compression. In this case, both keys could be set on the id column.

Segment Elimination

The single most important consideration for columnstore tables is setting a sort key. When filtering on the sort key, the amount of scanning the engine must perform is decreased. The minimum/maximum value metadata for each segment is used at query execution time to determine whether a segment can match a filter; if not, the segment is skipped entirely and no data is examined. This is called segment elimination because the segment is eliminated from the scan..

Indexing cuts down on the number of rows scanned when a select query with a where clause is executed. If the table has an index for the columns in question, the SingleStore engine can quickly determine the position to seek without having to look at all the data. This is much faster than reading every row sequentially.

The segment size also impacts query execution with respect to the sort key. A smaller segment size means that a small number of rows are read from the segments that pass segment elimination.

Column segments typically contain on the order of tens of thousands of rows. Using the people table (the segment size is 3 rows for readability):

Segment #1 of 3 - user values aa - jl

Segment #2 or 3 - user values jm - rl

Segment #3 of 3 - user values rm - zz

If a query is searching for a person whose first name starts with an "a", then only the first segment is scanned. The other two segments are eliminated from the scan since those user values are "jm - zz".

Using Multiple Sort Keys on a Table

Using a table that only stores integers, this is how the data is stored when only one column is set as the sort key.

CREATE TABLE INTEGERS (
A int,
B int,
SORT KEY (A)
);

If only integers 1 - 10 are inserted into both columns, the segmentation would look like this if we review only values 1 - 3 for column A:

Graphic illustrating columnstore segments and the columns with values assigned to each segment.

The values in column A are segmented in order.  Each distinct value of column A is stored on the same location on disk (unless the count of each distinct value exceeds 1 million, then the excess is stored on a different segment)  The values in column B are stored randomly with the segmented column A.

If this same INTEGERS table was created with both columns as sort keys, here's the syntax:

CREATE TABLE INTEGERS (
A int,
B int,
SORT KEY (A,B)
);

Again, if only integers 1 - 10 are inserted into both columns, the segmentation would look like this if we review only values 1 - 3 for column A:

Graphic illustrating columnstore segments and the columns with values assigned to each segment.

The values in column A are segmented in order. Each distinct value of column A is stored on the same location on disk (unless the count of each distinct value exceeds 1 million, then the excess is stored on a different segment). The values in column B are stored in order with the segmented column A.

Creating multiple sort keys on a table can decrease the amount of scanning needed when querying a table. However, you must understand your data. If you expect low cardinality in your tables one of the sort keys, there could be decreased performance since both columns A and B will need to be scanned.

A shard key can also be set on columnstore tables to further improve performance.

Questions to Ask When Choosing Sort Keys

  • Is the data always filtered by some column (e.g. insert timestamp or event type)? Ensure that the common columns for all queries are in the sort key to improve segment elimination.

  • Is the data generally inserted in order by some column (e.g. insert timestamp)? It’s best to put that column first in the sort key to minimize the amount of work required by the background columnstore segment merger.

  • Does one column in your key have a higher cardinality than the other? It’s best to put the lowest cardinality columns first to increase the likelihood that segment elimination will be able to affect later columns.

Managing Columnstore Segments

A columnstore table will have the best performance if the rows in the table are in global sorted order across all the row segments. In reality, maintaining such an order is not feasible in the presence of continuous writes.

SingleStore uses an advanced algorithm that allows it to maintain the order as close to sorted as possible, while data is being ingested or updated. Such a process is called a background merger and is constantly running in the background if the order of the row segments can be improved.

Background merger is optimistic, in that if at any point it tries to move around data which is also being changed by a concurrent UPDATE or DELETE query, it will discard all the work it has done so far and start over. It works on a small chunk of data at a time, so it is always a relatively small amount of work that is being discarded. However, in the presence of a very heavy update workload, it can be a significant slowdown compared to a pessimistic merger, which locks the row segments it is currently processing. A user can manually trigger a pessimistic merger by running an OPTIMIZE TABLE command. We will explain below how to decide whether such a command is necessary, and how to run it.

SingleStore uses a concept of a sorted row segment group to describe a set of row segments that are sorted together. Row segments form a sorted row segment group if and only if there is an order on the row segments such that for each row segment the smallest row in it is no smaller than the largest row in any row segment before it. Here and below when we say that one row is smaller than another row, we mean that the values of columns of the SORT KEY of that row are smaller than those of the other row.

If the data had a perfect global order, it would consist of a single sorted row segment group. If the data is in a completely random order, it is likely to comprise as many sorted row segment groups as there are row segments. The goal of the background merger is to reorganize the rows among row segments in such a way that the number of sorted row segment groups is as small as possible.

To inspect the current state of the sorted row segment groups of a particular table, run the SHOW COLUMNAR MERGE STATUS FOR <table_name> command:

SHOW COLUMNAR MERGE STATUS FOR groups;
+----------------------------------------------------------------------------+
| Merger           | State | Plan                    | Progress | Partition  |
+----------------------------------------------------------------------------+
| (Current groups) | NULL  | 741,16,1                | NULL     | 0          |
| (Current groups) | NULL  | 782,20                  | NULL     | 1          |
| (Current groups) | NULL  | 701,40,5                | NULL     | 2          |
| (Current groups) | NULL  | 326,207,123,37,21,19,17 | NULL     | 3          |
+----------------------------------------------------------------------------+

Let’s look closely at the first row of the result. According to it, the slice of the table that is stored on partition 0 has three sorted row segment groups, one consists of 741 row segments, one consists of 16 row segments, and one consists of a single row segment - a total of 758 row segments. Consider the impact of such a split into sorted row segment groups on a very simple query like

SELECT * FROM groups WHERE user_group = 15;

The very first sorted row segment group will have at most one row segment that contains rows with user_group equal to 15, unless user_group = 15 is on the boundary of two row segments, or if there is a large data skew and several row segments consist only of rows with user_group = 15. Similarly, at most one row segment in the second sorted row segment group contains relevant rows, and the only segment of the third sorted row segment group might also contain relevant rows. This way, only three row segments out of the total of 758 will be opened and materialized. While the query in this example is very simple, similar reasoning works for significantly more complex queries.

Now take a look at the sorted row segment groups on partition 3. It is significantly less optimized than the remaining three, and a SELECTquery like the one shown above will result in materializing eight row segments. If the background merger is enabled, and no workload is running concurrently, within several seconds this partition would get optimized. However, in the presence of a heavy workload, the optimistic background merger might fall behind. In this case, it might be reasonable to manually trigger a pessimistic merger by calling:

OPTIMIZE TABLE groups;

If we run SHOW COLUMNAR MERGE STATUS while OPTIMIZE TABLE is being executed, we might see the manual merger in action:

SHOW COLUMNAR MERGE STATUS FOR groups;
+--------------------------------------------------------------------------------+
| Merger           | State   | Plan                    | Progress | Partition    |
+--------------------------------------------------------------------------------+
| (Current groups) | NULL    | 741,16,1                | NULL     | 0            |
| (Current groups) | NULL    | 782,20                  | NULL     | 1            |
| (Current groups) | NULL    | 701,40,5                | NULL     | 2            |
| (Current groups) | NULL    | 326,207,123,37,21,19,17 | NULL     | 3            |
| Manual Merger    | Working | 326+207+123+37+21+19+17 | 53.12%   | 3            |
+--------------------------------------------------------------------------------+

What this new row indicates is that there is a manual merger running on partition 3 and that at this time it has done 53.12% of the work.

When the merger is done, the table now is in a better shape:

SHOW COLUMNAR MERGE STATUS FOR groups;
+------------------------------------------------------------+
| Merger           | State | Plan     | Progress | Partition |
+------------------------------------------------------------+
| (Current groups) | NULL  | 741,16,1 | NULL     | 0         |
| (Current groups) | NULL  | 782,20   | NULL     | 1         |
| (Current groups) | NULL  | 701,40,5 | NULL     | 2         |
| (Current groups) | NULL  | 730,20   | NULL     | 3         |
+------------------------------------------------------------+

Note that at no point were any of the partitions merged into a single sorted row segment group in this example. The reason is both optimistic and pessimistic mergers use an advanced algorithm that is optimized to do small amortized chunks of work in the presence of concurrent writes and maintain data in a few sorted row segment groups, rather than to attempt to merge all the data into a single sorted row segment group. In cases when it is acceptable to sacrifice some time on data ingestion to achieve even higher SELECT performance, it is possible to run a manual command that merges data on each partition into a single sorted row segment group:

OPTIMIZE TABLE groups FULL; SHOW COLUMNAR MERGE STATUS FOR groups;
+---------------------------------------------------------+
| Merger           | State | Plan | Progress | Partition  |
+---------------------------------------------------------+
| (Current groups) | NULL  | 758  | NULL     | 0          |
| (Current groups) | NULL  | 802  | NULL     | 1          |
| (Current groups) | NULL  | 746  | NULL     | 2          |
| (Current groups) | NULL  | 750  | NULL     | 3          |
+---------------------------------------------------------+

At this time, any highly selective SELECT will materialize only one row segment per partition.

Important

Unlike OPTIMIZE TABLE <name>, which takes the amount of time proportional to the size of recently loaded data, OPTIMIZE TABLE <name> FULL always takes the amount of time in the order of magnitude of the size of the entire table, unless data in that table is already sorted.

When inserting a small number of rows into the columnstore table, an in-memory rowstore-backed segment is used to store the rows. As this rowstore-backed segment fills, the background flusher periodically will flush these rows to disk. A rowstore-backed segment can be flushed to disk manually by running OPTIMIZE TABLE <table_name> FLUSH.

OPTIMIZE TABLE t FLUSH;

Last modified: March 8, 2024

Was this article helpful?