Writing Columnstore Data
Unlike other columnstore implementations, SingleStoreDB supports very fast, small-batch writes (such as single row inserts and updates) directly into columnstore tables.
In some cases, writes (inserts and updates) are implemented by storing newly written rows in a rowstore-oriented skiplist before flushing them to the column-oriented format.
In other cases, the rowstore-skiplist is bypassed and writes are made directly to the column-oriented format on disk.
The following explains the behavior of different operations that can be performed against a columnstore.
Insert - Inserts into a columnstore will either go into the rowstore-backed segment or a new columnstore-backed row segment.
In the latter case, the inserts are written directly to disk.
If the engine variable
columnstore_has a low value,
disk_ insert_ threshold
. . SELECT
INSERTwrite directly to disk, instead of writing first to rowstore-backed memory.
INSERTfirst writes to rowstore-backed memory.
Delete - Deleting a row in a columnstore causes the row to be marked as deleted in the segment metadata, leaving the data in place within the row segment but not included in the query results.
Disk space will not immediately free up because the data remains on the segment. The deleted data is still stored on the segment unless it is merged with another segment. Then the file is rewritten to update the segment metadata. Segments that contain only deleted rows are removed, and the optimization process covered below will compact segments that require optimization.
Update - An update in a columnstore is internally performed as a delete followed by an insert within a transaction.
REPLACEupdates rows in a rowstore-oriented skiplist before flushing them to the column-oriented format.
Optimization - SingleStoreDB has optimization routines that run automatically and can also be started manually.
These routines attempt to automatically merge row segments together in order to improve query efficiency. DML statements can be performed while columnstore optimizations take place. For more information, see the OPTIMIZE TABLE topic.
It is recommended to use the default setting for the
columnstore_ variable, which does straight to disk automatically when the batch is large enough.
columnstore_ variable to zero.
Last modified: October 28, 2022