Columnstore Seekablity
On this page
In SingleStoreDB versions before 8.
When using the seekable JSON encoding for columnstore tables, data is stored as follows:
-
Top-level JSON properties are stored column-wise and use the same encodings as other data types (i.
e. , numbers, strings, and dates). -
Nested properties (inside sub-documents) are stored in sequential column format, using Singlestore’s standard encoding.
-
Arrays are pivoted and stored in column format.
For example, a numeric array that is a property of an enclosing document would be stored as a logical column, with multiple values in the column belonging to the same row. The array length information is kept to enable SingleStoreDB to identify what values belong to what row. -
Columns that do not regularly appear in the documents (e.
g. , less than 1 in 10,000) are stored in a catch-all area.
A (segment_
) pair functions as a row ID for a row in a partition of a columnstore table.
Seekability has been improved for the following string encodings:
-
LZ4
-
Run_
length encoding (RLE)
Using JSON Row-Level Decoding
Row-Level Decoding is enabled via the use_
engine variable.use_
is set to ON (enabled).use_
(and therefore the Row-Level Decoding feature) is not available.
SELECT @@use_seekable_json;
+---------------------+
| @@use_seekable_json |
+---------------------+
| 1 |
+---------------------+
Applying Columnstore Seekability to Existing Data
The easiest way, which almost always works and keeps the table data online while being done, is to change the setting to use_
, then run OPTIMIZE TABLE FULL
.
If there is no consistent improvement in seek performance; and the table can be taken offline to be reorganized; this procedure is guaranteed to rewrite all the segments in the new, seekable format.
To apply this behavior to existing tables, do the following:
-
Enable or check if the the engine variable is enabled for your database:
SET GLOBAL use_
seekable_ json = ON; -
Create a new table to copy the existing data to:
CREATE TABLE <new_
table> LIKE <old_ table>; -
Insert the existing data from the old table to the new table:
INSERT INTO <new_
table> LIKE <old_ table>; -
Run the optimize table statement with the full argument:
OPTIMIZE TABLE <new_
table> FULL; -
Rename the old table with a temporary name:
ALTER TABLE <old_
table> RENAME TO <tmp_ old_ table>; -
Rename the new table to the original (i.
e. , old table name): ALTER TABLE <new_
;table> TO <old_ table> -
Drop the temporary table:
DROP TABLE <tmp_
old_ table>;
If the table needs to remain online but: the guarantee that every segment is reorganized is needed, update every row in the table gradually, batch by batch.
To avoid touching the other data, add a new dummy column to the table and update it.
Related Topics
Last modified: February 6, 2023