Managing Columnstore Segments

A columnstore table will have the best performance if the rows in the table are in global sorted order across all the row segments. In reality, maintaining such an order is not feasible in the presence of continuous writes.

SingleStore uses an advanced algorithm that allows it to maintain the order as close to sorted as possible, while data is being ingested or updated. Such a process is called a background merger and is constantly running in the background if the order of the row segments can be improved.

Background merger is optimistic, in that if at any point it tries to move around data which is also being changed by a concurrent UPDATE or DELETE query, it will discard all the work it has done so far and start over. It works on a small chunk of data at a time, so it is always a relatively small amount of work that is being discarded. However, in the presence of a very heavy update workload, it can be a significant slowdown compared to a pessimistic merger, which locks the row segments it is currently processing. A user can manually trigger a pessimistic merger by running an OPTIMIZE TABLE command. We will explain below how to decide whether such a command is necessary, and how to run it.

SingleStore uses a concept of a sorted row segment group to describe a set of row segments that are sorted together. Row segments form a sorted row segment group if and only if there is an order on the row segments such that for each row segment the smallest row in it is no smaller than the largest row in any row segment before it. Here and below when we say that one row is smaller than another row, we mean that the values of columns of the SORT KEY of that row are smaller than those of the other row.

Note

KEY() USING CLUSTERED COLUMNSTORE is a legacy syntax that is equivalent to SORT KEY(). SingleStore recommends using SORT KEY().

If the data had a perfect global order, it would consist of a single sorted row segment group. If the data is in a completely random order, it is likely to comprise as many sorted row segment groups as there are row segments. The goal of the background merger is to reorganize the rows among row segments in such a way that the number of sorted row segment groups is as small as possible.

To inspect the current state of the sorted row segment groups of a particular table, run the SHOW COLUMNAR MERGE STATUS FOR <table_name> command:

SHOW COLUMNAR MERGE STATUS FOR groups;
+----------------------------------------------------------------------------+
| Merger           | State | Plan                    | Progress | Partition  |
+----------------------------------------------------------------------------+
| (Current groups) | NULL  | 741,16,1                | NULL     |   0        |
| (Current groups) | NULL  | 782,20                  | NULL     |   1        |
| (Current groups) | NULL  | 701,40,5                | NULL     |   2        |
| (Current groups) | NULL  | 326,207,123,37,21,19,17 | NULL     |   3        |
+----------------------------------------------------------------------------+

Let’s look closely at the first row of the result. According to it, the slice of the table that is stored on partition 0 has three sorted row segment groups, one consists of 741 row segments, one consists of 16 row segments, and one consists of a single row segment - a total of 758 row segments. Consider the impact of such a split into sorted row segment groups on a very simple query like

SELECT * FROM groups WHERE user_group = 15;

The very first sorted row segment group will have at most one row segment that contains rows with user_group equal to 15, unless user_group = 15 is on the boundary of two row segments, or if there is a large data skew and several row segments consist only of rows with user_group = 15. Similarly, at most one row segment in the second sorted row segment group contains relevant rows, and the only segment of the third sorted row segment group might also contain relevant rows. This way, only three row segments out of the total of 758 will be opened and materialized. While the query in this example is very simple, similar reasoning works for significantly more complex queries.

Now take a look at the sorted row segment groups on partition 3. It is significantly less optimized than the remaining three, and a SELECTquery like the one shown above will result in materializing eight row segments. If the background merger is enabled, and no workload is running concurrently, within several seconds this partition would get optimized. However, in the presence of a heavy workload, the optimistic background merger might fall behind. In this case, it might be reasonable to manually trigger a pessimistic merger by calling:

OPTIMIZE TABLE groups

If we run SHOW COLUMNAR MERGE STATUS while OPTIMIZE TABLE is being executed, we might see the manual merger in action:

SHOW COLUMNAR MERGE STATUS FOR groups;
+--------------------------------------------------------------------------------+
| Merger           | State    | Plan                    | Progress | Partition   |
+--------------------------------------------------------------------------------+
| (Current groups) | NULL     | 741,16,1                | NULL     |         0   |
| (Current groups) | NULL     | 782,20                  | NULL     |         1   |
| (Current groups) | NULL     | 701,40,5                | NULL     |         2   |
| (Current groups) | NULL     | 326,207,123,37,21,19,17 | NULL     |         3   |
| Manual Merger    | Working  | 326+207+123+37+21+19+17 | 53.12%   |         3   |
+--------------------------------------------------------------------------------+

What this new row indicates is that there is a manual merger running on partition 3 and that at this time it has done 53.12% of the work.

When the merger is done, the table now is in a better shape:

SHOW COLUMNAR MERGE STATUS FOR groups;
+------------------------------------------------------------+
| Merger           | State | Plan     | Progress | Partition |
+------------------------------------------------------------+
| (Current groups) | NULL  | 741,16,1 | NULL     |         0 |
| (Current groups) | NULL  | 782,20   | NULL     |         1 |
| (Current groups) | NULL  | 701,40,5 | NULL     |         2 |
| (Current groups) | NULL  | 730,20   | NULL     |         3 |
+------------------------------------------------------------+

Note that at no point were any of the partitions merged into a single sorted row segment group in this example. The reason is both optimistic and pessimistic mergers use an advanced algorithm that is optimized to do small amortized chunks of work in the presence of concurrent writes and maintain data in a few sorted row segment groups, rather than to attempt to merge all the data into a single sorted row segment group. In cases when it is acceptable to sacrifice some time on data ingestion to achieve even higher SELECT performance, it is possible to run a manual command that merges data on each partition into a single sorted row segment group:

OPTIMIZE TABLE groups FULL;
SHOW COLUMNAR MERGE STATUS FOR groups;
+---------------------------------------------------------+
| Merger           | State | Plan | Progress | Partition  |
+---------------------------------------------------------+
| (Current groups) | NULL  | 758  | NULL     |         0  |
| (Current groups) | NULL  | 802  | NULL     |         1  |
| (Current groups) | NULL  | 746  | NULL     |         2  |
| (Current groups) | NULL  | 750  | NULL     |         3  |
+---------------------------------------------------------+

At this time, any highly selective SELECT will materialize only one row segment per partition.

Important

Unlike OPTIMIZE TABLE <name>, which takes the amount of time proportional to the size of recently loaded data, OPTIMIZE TABLE <name> FULL always takes the amount of time in the order of magnitude of the size of the entire table, unless data in that table is already sorted.

When inserting a small number of rows into the columnstore table, an in-memory rowstore-backed segment is used to store the rows. As this rowstore-backed segment fills, the background flusher periodically will flush these rows to disk. A rowstore-backed segment can be flushed to disk manually by running OPTIMIZE TABLE <table_name> FLUSH.

OPTIMIZE TABLE t FLUSH;

MV_COLUMNSTORE_MERGE_STATUS shows the details about the merger processes.

Unneeded columnstore segments stored in memory are eliminated in a process called garbage collection. A buffer manager maintains unused memory blocks for future use until no query or operation is using the unused memory anymore.

The garbage collection process automatically runs in the background, so it's normally not necessary to execute the TRIGGER GC [FLUSH] command which manually activates the garbage collector. However, if memory-related errors or node failures are being generated, executing TRIGGER GC [FLUSH] might be appropriate. See TRIGGER GC for more information.

Last modified: August 1, 2024

Was this article helpful?