Columnstore Seekability using JSON
On this page
In SingleStore versions before 8.
When using the seekable JSON encoding for columnstore tables, data is stored as follows:
-
Top-level JSON properties are stored column-wise and use the same encodings as other data types (i.
e. , numbers, strings, and dates). -
Nested properties (inside sub-documents) are stored in sequential column format, using Singlestore’s standard encoding.
-
Arrays are pivoted and stored in column format.
For example, a numeric array that is a property of an enclosing document would be stored as a logical column, with multiple values in the column belonging to the same row. The array length information is kept to enable SingleStore to identify what values belong to what row. -
Columns that do not regularly appear in the documents (e.
g. , less than 1 in 10,000) are stored in a catch-all area.
A (segment_
) pair functions as a row ID for a row in a partition of a columnstore table.
Seekability has been improved for the following string encodings:
-
LZ4 (SeekableLZ4)
-
Run_
length encoding (RLE)
The option SeekableLZ4
on JSON columns improves overall resource consumption for string data types for universal storage for LZ4 and run-length encoded (RLE) data.
column_name JSON COLLATE utf8_bin NOT NULL option 'SeekableLZ4'
Using JSON Row-Level Decoding
Row-Level Decoding is enabled via the use_
engine variable.use_
is set to ON (enabled).use_
(and therefore the Row-Level Decoding feature) is not available.
SELECT @@use_seekable_json;
+---------------------+
| @@use_seekable_json |
+---------------------+
| 1 |
+---------------------+
Applying Columnstore Seekability to Existing Data
The easiest way, which almost always works and keeps the table data online while being done, is to change the setting to use_
, then run OPTIMIZE TABLE FULL
.
If there is no consistent improvement in seek performance; and the table can be taken offline to be reorganized; this procedure is guaranteed to rewrite all the segments in the new, seekable format.
To apply this behavior to existing tables, do the following:
-
Enable or check if the the engine variable is enabled for your database:
SET GLOBAL use_
seekable_ json = ON; -
Create a new table to copy the existing data to:
CREATE TABLE <new_
table> LIKE <old_ table>; -
Insert the existing data from the old table to the new table:
INSERT INTO <new_
table> LIKE <old_ table>; -
Run the optimize table statement with the full argument:
OPTIMIZE TABLE <new_
table> FULL; -
Rename the old table with a temporary name:
ALTER TABLE <old_
table> RENAME TO <tmp_ old_ table>; -
Rename the new table to the original (i.
e. , old table name): ALTER TABLE <new_
;table> TO <old_ table> -
Drop the temporary table:
DROP TABLE <tmp_
old_ table>;
If the table needs to remain online but the guarantee that every segment is reorganized is needed, update every row in the table gradually, batch by batch.
To avoid touching the other data, add a new dummy column to the table and update it.
Related Topics
Last modified: October 9, 2024