Writing Columnstore Data

Unlike other columnstore implementations, SingleStore supports very fast, small-batch writes (such as single row inserts and updates) directly into columnstore tables.

In some cases, writes (inserts and updates) are implemented by storing newly written rows in a rowstore-oriented skiplist before flushing them to the column-oriented format. Rows are visible to reads as soon as they are committed to the rowstore-oriented skiplist.

In other cases, the rowstore-skiplist is bypassed and writes are made directly to the column-oriented format on disk.

The following explains the behavior of different operations that can be performed against a columnstore.

  1. Insert - Inserts into a columnstore will either go into the rowstore-backed segment or a new columnstore-backed row segment. In the latter case, the inserts are written directly to disk.

    If the engine variable columnstore_disk_insert_threshold has a low value, INSERT...SELECT , INSERT ... ON DUPLICATE KEY UPDATE, INSERT ... IGNORE and INSERT write directly to disk, instead of writing first to rowstore-backed memory. Otherwise, INSERT first writes to rowstore-backed memory.

  2. Delete - Deleting a row in a columnstore causes the row to be marked as deleted in the segment metadata, leaving the data in place within the row segment but not included in the query results. Disk space will not immediately free up because the data remains on the segment. The deleted data is still stored on the segment unless it is merged with another segment. Then the file is rewritten to update the segment metadata. Segments that contain only deleted rows are removed, and the optimization process covered below will compact segments that require optimization.

  3. Update - An update in a columnstore is internally performed as a delete followed by an insert within a transaction.

    If the columnstore_disk_insert_threshold engine variable has a low value, UPDATE writes directly to disk, instead of writing first to rowstore-backed memory. Otherwise, UPDATE first writes to rowstore-backed memory.

  4. Replace

    If the columnstore_disk_insert_threshold engine variable has a low value,REPLACE writes directly to disk, instead of writing first to rowstore-backed memory. Otherwise, REPLACE first writes to rowstore-backed memory.

  5. Optimization - SingleStore has optimization routines that run automatically and can also be started manually. These routines attempt to automatically merge row segments together in order to improve query efficiency. DML statements can be performed while columnstore optimizations take place. For more information, see the OPTIMIZE TABLE topic.

It is recommended to use the default setting for the columnstore_disk_insert_threshold variable, which does straight to disk automatically when the batch is large enough. Lowering the value will increase the creation of small files. This causes the inodes in the file system to fill up rapidly and will require more work for the merger process when merging all these small files. It is not recommended to set the columnstore_disk_insert_threshold variable to zero.

Last modified: October 28, 2022

Was this article helpful?