Columnstore Seekability using JSON

In SingleStore versions before 8.0, seeking into JSON data columns meant searching the entire segment which could be up to one million rows. Beginning in version 8.0, sub-segment seeking is available and enabled by default. Instead of searching a large chunk of data, a subset of rows with the same features will be grouped together. This results in much better performance.

When using the seekable JSON encoding for columnstore tables, data is stored as follows:

  • Top-level JSON properties are stored column-wise and use the same encodings as other data types (i.e., numbers, strings, and dates).

  • Nested properties (inside sub-documents) are stored in sequential column format, using Singlestore’s standard encoding.

  • Arrays are pivoted and stored in column format. For example, a numeric array that is a property of an enclosing document would be stored as a logical column, with multiple values in the column belonging to the same row. The array length information is kept to enable SingleStore to identify what values belong to what row.

  • Columns that do not regularly appear in the documents (e.g., less than 1 in 10,000) are stored in a catch-all area.

A (segment_id, row_number) pair functions as a row ID for a row in a partition of a columnstore table. Given this pair, you can directly seek into the specified segment to find the data for row_number. It is not necessary to scan the whole segment to find the data for the row.

Seekability has been improved for the following string encodings:

  • LZ4 (SeekableLZ4)

  • Run_length encoding (RLE)

The option SeekableLZ4 on JSON columns improves overall resource consumption for string data types for universal storage for LZ4 and run-length encoded (RLE) data. The option can be applied in a table against a column as follows:

column_name JSON COLLATE utf8_bin NOT NULL option 'SeekableLZ4'

Using JSON Row-Level Decoding

Row-Level Decoding is enabled via the use_seekable_json engine variable. By default in version 8.0, use_seekable_json is set to ON (enabled). In pre-8.0 versions the engine variable use_seekable_json (and therefore the Row-Level Decoding feature) is not available. To check if the engine variable is enabled:

SELECT @@use_seekable_json;
+---------------------+
| @@use_seekable_json |
+---------------------+
|                   1 |
+---------------------+

Applying Columnstore Seekability to Existing Data

The easiest way, which almost always works and keeps the table data online while being done, is to change the setting to use_seekable_json = ON, then run OPTIMIZE TABLE FULL.

If there is no consistent improvement in seek performance; and the table can be taken offline to be reorganized; this procedure is guaranteed to rewrite all the segments in the new, seekable format.

To apply this behavior to existing tables, do the following:

  1. Enable or check if the the engine variable is enabled for your database: SET GLOBAL use_seekable_json = ON;

  2. Create a new table to copy the existing data to: CREATE TABLE <new_table> LIKE <old_table>;

  3. Insert the existing data from the old table to the new table: INSERT INTO <new_table> LIKE <old_table>;

  4. Run the optimize table statement with the full argument: OPTIMIZE TABLE <new_table> FULL;

  5. Rename the old table with a temporary name: ALTER TABLE <old_table> RENAME TO <tmp_old_table>;

  6. Rename the new table to the original (i.e., old table name): ALTER TABLE <new_table> TO <old_table>;

  7. Drop the temporary table: DROP TABLE <tmp_old_table>;

If the table needs to remain online but the guarantee that every segment is reorganized is needed, update every row in the table gradually, batch by batch. The background merger will regorganize the segments.

To avoid touching the other data, add a new dummy column to the table and update it. The updates must be true updates; setting a value to itself will not be considered an update.

Last modified: October 9, 2024

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK