Managing Columnstore Segments
On this page
A columnstore table will have the best performance if the rows in the table are in global sorted order across all the row segments.
Background Merger
SingleStore uses an advanced algorithm that allows it to maintain the order of columnstore tables as close to sorted as possible, while data is being ingested or updated.background merger
and runs constantly in the background.
The background merger
runs in an optimistic manner.background merger
works on a small chunk of data at a time, so if a conflict with an UPDATE
or DELETE
occurs, a relatively small amount of work is discarded.OPTIMIZE TABLE
command.OPTIMIZE TABLE
command is necessary, and how to run it.
Sorted Row Segment Group
SingleStore uses the concept of a sorted row segment group
to describe a set of row segments that are sorted together on the columnstore key columns.SORT KEY
of that row are smaller than those of the other row.
Within a sorted row segment group there are no row segments with overlapping value ranges for the sort key column(s).
A table may have multiple sorted row segment groups.
Note
KEY() USING CLUSTERED COLUMNSTORE
is a legacy syntax that is equivalent to SORT KEY()
.SORT KEY()
.
To inspect the current state of the sorted row segment groups of a table, named sales
, run the SHOW COLUMNAR MERGE STATUS FOR <table_
command:
SHOW COLUMNAR MERGE STATUS FOR sales;
+----------------------------------------------------------------------------+
| Merger | State | Plan | Progress | Partition |
+----------------------------------------------------------------------------+
| (Current sales) | NULL | 741,16,1 | NULL | 0 |
| (Current sales) | NULL | 782,20 | NULL | 1 |
| (Current sales) | NULL | 701,40,5 | NULL | 2 |
| (Current sales) | NULL | 326,207,123,37,21,19,17 | NULL | 3 |
+----------------------------------------------------------------------------+
Observe the first row of the result.
SELECT * FROM sales WHERE store_id = 15;
This query filters on the SORT KEY
which is store_
in this example.store_
equal to 15
, unless rows with store_
are on the boundary of two row segments, or if there is a large data skew and several row segments consist only of rows with store_
.
Optimize Sorted Row Segment Groups
Now take a look at the sorted row segment groups on partition 3, which is significantly less optimized than the other partitions.SELECT
query similar to the one above will materialize eight row segments.
OPTIMIZE TABLE sales
By running SHOW COLUMNAR MERGE STATUS
while OPTIMIZE TABLE
is running, you may see the manual merger in action:
SHOW COLUMNAR MERGE STATUS FOR sales;
+--------------------------------------------------------------------------------+
| Merger | State | Plan | Progress | Partition |
+--------------------------------------------------------------------------------+
| (Current sales) | NULL | 741,16,1 | NULL | 0 |
| (Current sales) | NULL | 782,20 | NULL | 1 |
| (Current sales) | NULL | 701,40,5 | NULL | 2 |
| (Current sales) | NULL | 326,207,123,37,21,19,17 | NULL | 3 |
| Manual Merger | Working | 326+207+123+37+21+19+17 | 53.12% | 3 |
+--------------------------------------------------------------------------------+
The last row indicates that there is a manual merger running on partition 3 and that at this time 53.
of the work is complete.
When the merger is done, the table has fewer sorted row segment groups in partition 3:
SHOW COLUMNAR MERGE STATUS FOR sales;
+------------------------------------------------------------+
| Merger | State | Plan | Progress | Partition |
+------------------------------------------------------------+
| (Current sales) | NULL | 741,16,1 | NULL | 0 |
| (Current sales) | NULL | 782,20 | NULL | 1 |
| (Current sales) | NULL | 701,40,5 | NULL | 2 |
| (Current sales) | NULL | 730,20 | NULL | 3 |
+------------------------------------------------------------+
During the optimization process, no partitions were merged into a single sorted row segment group as can be seen in the output shown above.
In cases when it is acceptable to sacrifice some time on data ingestion to achieve even higher SELECT
performance, it is possible to run a manual command that merges data on each partition into a single sorted row segment group.OPTIMIZE TABLE <table_
will merge the data on each partition into a single sorted row segment group, as shown below.
OPTIMIZE TABLE sales FULL;SHOW COLUMNAR MERGE STATUS FOR groups;
+---------------------------------------------------------+
| Merger | State | Plan | Progress | Partition |
+---------------------------------------------------------+
| (Current sales) | NULL | 758 | NULL | 0 |
| (Current sales) | NULL | 802 | NULL | 1 |
| (Current sales) | NULL | 746 | NULL | 2 |
| (Current sales) | NULL | 750 | NULL | 3 |
+---------------------------------------------------------+
When the table is in this state, any highly selective SELECT
query will materialize one row segment per partition.
Important
Unlike OPTIMIZE TABLE <table_
, which takes the amount of time proportional to the size of recently loaded data, OPTIMIZE TABLE <table_
takes an amount of time proportional to the order of magnitude of the size of the entire table, unless data in that table is already sorted.
Inserts, Deletes, and Garbage Collection
When inserting a small number of rows into the columnstore table, an in-memory rowstore-backed segment is used to store the rows.background flusher
periodically flushes these rows to disk.OPTIMIZE TABLE <table_
.
OPTIMIZE TABLE t FLUSH;
Deletes in columnstores are marked as deleted but otherwise left in place (refer to Writing Columnstore Data).
MV_
shows the details about the merger processes.
Unneeded columnstore segments stored in memory are eliminated in a process called garbage collection.
The garbage collection process automatically runs in the background, so it's normally not necessary to execute the TRIGGER GC [FLUSH]
command which manually activates the garbage collector.TRIGGER GC [FLUSH]
might be appropriate.
Last modified: January 27, 2025