Writing Columnstore Data
Unlike other columnstore implementations, SingleStore supports very fast, small-batch writes (such as single row inserts and updates) directly into columnstore tables.
In some cases, writes (inserts and updates) are implemented by storing newly written rows in a rowstore-oriented skiplist before flushing them to the column-oriented format.
In other cases, the rowstore-skiplist is bypassed and writes are made directly to the column-oriented format on disk.
The following explains the behavior of different operations that can be performed against a columnstore.
-
Insert - Inserts into a columnstore will either go into the rowstore-backed segment or a new columnstore-backed row segment.
In the latter case, the inserts are written directly to disk. If the engine variable
columnstore_
has a low value,disk_ insert_ threshold INSERT.
. . SELECT , INSERT .
. . ON DUPLICATE KEY UPDATE , INSERT .
and. . IGNORE INSERT
write directly to disk, instead of writing first to rowstore-backed memory.Otherwise, INSERT
first writes to rowstore-backed memory. -
Delete - Deleting a row in a columnstore causes the row to be marked as deleted in the segment metadata, leaving the data in place within the row segment but not included in the query results.
Disk space will not immediately free up because the data remains on the segment. The deleted data is still stored on the segment unless it is merged with another segment. Then the file is rewritten to update the segment metadata. Segments that contain only deleted rows are removed, and the optimization process covered below will compact segments that require optimization. -
Update - An update in a columnstore is internally performed as a delete followed by an insert within a transaction.
If the
columnstore_
engine variable has a low value,disk_ insert_ threshold UPDATE
writes directly to disk, instead of writing first to rowstore-backed memory.Otherwise, UPDATE
first writes to rowstore-backed memory. -
Replace
If the
columnstore_
engine variable has a low value,disk_ insert_ threshold REPLACE
writes directly to disk, instead of writing first to rowstore-backed memory.Otherwise, REPLACE
first writes to rowstore-backed memory. -
Optimization - SingleStore has optimization routines that run automatically and can also be started manually.
These routines attempt to automatically merge row segments together in order to improve query efficiency. DML statements can be performed while columnstore optimizations take place. For more information, see the OPTIMIZE TABLE topic.
SingleStore recommends using the default setting for the columnstore_
variable, which does straight to disk automatically when the batch is large enough.columnstore_
variable to zero.
Last modified: March 8, 2024