Columnstore Seekability using JSON
On this page
SingleStore exposes a Javascript Object Notation (JSON) column type that implements the JSON standard.
The primary encoding for JSON
columns in columnstore tables is SeekableJson
.SeekableJson
encoding are very efficient.
JSON
columns may also be encoded with the seekable string encodings SeekableLZ4
and SeekableStringRunLength
.SeekableLZ4
and SeekableStringRunLength
only for columns on which JSON values will be extracted as a whole.
The non-seekable, columnarized format, JSONParquet
, is provided for backwards compatibility.
The following table summarizes the properties of the encodings.
Encoding |
Seekable |
Columnarized |
Supports Efficient Search and Extraction |
Default |
---|---|---|---|---|
|
Yes |
Yes |
Yes |
Yes |
|
Yes |
No |
No |
No |
|
No |
Yes |
Yes, but less efficient than |
No |
Choose an Encoding
SingleStore recommends using SeekableJson
for any JSON
columns on which analytics, search, or extraction will be done.JSON_
, the ::
operators, or JSON_
on the JSON
column, use the SeekableJson
encoding for good performance.SeekableJson
is the default encoding.
SingleStore recommends using SeekableLZ4
(or SeekableStringRunLength
) if you are concerned about memory usage and will not be searching on or extracting from the JSON
column.SeekableLZ4
(or SeekableStringRunLength
) encoding may reduce memory usage but will hurt performance if you are extracting keys and values from the JSON.
Check which Encoding is Used
The information_
view stores information about each segment of a columnstore table.encoding
column of this view to find out which encoding is used.
The following query returns the encoding and sizes of the segments for column <column_
; in table <table_
.
SELECT column_name, segment_id, encoding, uncompressed_size, compressed_sizeFROM information_schema.COLUMNAR_SEGMENTSWHERE table_name = ‘<table_name>’ and column_name = '<column_name>';
SeekableJson
In the SeekableJson
encoding, a schema is inferred from JSON keys and the data is split into columns by keypath (columnarized) and stored in an encoded Parquet-like format.JSON
columns read only the parts of a JSON object that are relevant to a query and therefore have excellent performance.
The SeekableJson
encoding format is inspired by Apache Parquet but does not produce a Parquet file.
SeekableJson
is a seekable encoding.
When using the SeekableJson
encoding for JSON columns in columnstore tables, the data is stored as follows:
-
Top-level JSON properties are stored columnwise and use the same encodings as other data types (i.
e. , numbers, strings, and dates). -
Nested properties (inside sub-documents) are stored in sequential column format, using SingleStore's standard encoding.
-
Arrays are pivoted and stored in column format.
For example, a numeric array that is a property of an enclosing JSON document would be stored as a logical column, with multiple values in the column belonging to the same row. The array length information is kept to enable SingleStore to identify what values belong to what row.
The engine variable use_
controls if the default encoding for JSON
columns is SeekableJson
or JSONParquet
.use_
is set to ON
, the default encoding for JSON
columns is SeekableJson
.
Check if use_
is enabled:
SELECT @@use_seekable_json;
+---------------------+
| @@use_seekable_json |
+---------------------+
| 1 |
+---------------------+
Limit Number of Inferred Keypaths
The engine variable json_
limits the number of JSON key paths inferred within a segment and can be used to tune schema inference.
SeekableLZ4 and SeekableStringRunLength
JSON
columns may also be encoded with the seekable string encodings SeekableLZ4
and SeekableStringRunLength
.
These encodings store the JSON as a string.
SingleStore recommends using SeekableLZ4
and SeekableStringRunLength
only for columns on which JSON values are extracted as a whole.
Encoding String Data in Columnstore Tables provides a general description of encoding schemes and encoding scheme selection for string (including JSON) columns.
To specify a string encoding for a JSON
column, use a collation specification as follows:
CREATE TABLE json_table (json_lz4_col JSON COLLATE utf8_bin OPTION 'SeekableLZ4');CREATE TABLE json_table (json_rle_col JSON COLLATE utf8_bin OPTION 'SeekableStringRunLength');
JSON Parquet
Like SeekableJson
, the JSONParquet
format is columnarized and inspired by the Apache Parquet format but does not produce a Parquet file.
SingleStore recommends that you use SeekableJson
instead of JSONParquet
, as SeekableJson
is more performant.
JSONParquet
is a precursor to SeekableJson
and is not a seekable format, meaning that entire segments must be extracted and decoded for them to be processed.
Apply SeekableJson Encoding to Existing Data
To change the encoding of a JSON
column from ParquetJSON
to SeekableJson
encoding, use the following options.
Online Approach
To keep the table online when use_
is not set to ON
, you can upgrade existing JSON
columns to use SeekableJson
by setting use_
, and then running OPTIMIZE TABLE FULL
.SeekableJson
encoding.JSONParquet
encoding.
Offline Approach
If there is not a consistent improvement in seek performance with the Online Approach, query the information_
view using the following query to see if the JSONParquet
encoding is still used for many segments.
SELECT column_name, segment_id, encoding, uncompressed_size, compressed_sizeFROM information_schema.COLUMNAR_SEGMENTSWHERE table_name = '<table_name>' and column_name = '<column_name>';
If many segments are still using JSONParquet
and it is acceptable to take the table offline; use the following procedure to rewrite all the segments in the new seekable format.
-
Enable or check if the
use_
engine variable is enabled for your database:seekable_ json SET GLOBAL use_
seekable_ json = ON; -
Create a new table to copy the existing data to:
CREATE TABLE <new_
table> LIKE <old_ table>; -
Insert the existing data from the old table to the new table:
INSERT INTO <new_
table> SELECT * FROM <old_ table>; -
Run the optimize table statement with the full argument:
OPTIMIZE TABLE <new_
table> FULL; -
Rename the old table with a temporary name:
ALTER TABLE <old_
table> RENAME TO <tmp_ old_ table>; -
Rename the new table to the original (i.
e. , old table name): ALTER TABLE <new_
;table> TO <old_ table> -
Drop the temporary table:
DROP TABLE <tmp_
old_ table>;
If the table needs to remain online but you need to ensure every segment is reorganized, update every row in the table gradually.
To avoid modifying data in the table, add a new dummy column to the table and update it.
Columnstore JSON Internals
This section provides details on the SeekableJson
encoding.
A Parquet schema has a JSON-like syntax and a nested structure.
-
Required: exactly one occurrence
-
Optional: 0 or 1 occurrence
-
Repeated: 0 or more occurrences
The infer schema process works as follows:
-
The schema loops through the list of JSON objects.
-
The present keypaths are merged into a schema tree object.
-
If there is a type conflict, the node in the schema tree is marked as un-inferable.
-
If any node in the schema tree contains more children than indicated in the
json_
engine variable, the node is marked as un-inferable.document_ max_ children -
If a node has a greater number of children than indicated in the engine variable
json_
and the average occurrence of all children related to the parent is less than 1/ as indicated in the engine variabledocument_ sparse_ children_ check_ threshold json_
, the node will be marked as un-inferable.document_ sparse_ children_ check_ ratio
-
-
Once the schema tree object is constructed, analyze the schema tree and prune the tree until the number of key paths (distinct root to leaf paths) is less than the setting for the engine variable
json_
.document_ max_ leaves Also, prune any node where the average occurrence of all the children in relation to the number of rows is less than 1/ as indicated in the engine variable json_
.document_ absolute_ sparse_ key_ check_ ratio
The examples will use the engine variables settings and the JSON object table shown below:
-
json_
document_ max_ children = 4 -
json_
document_ sparse_ children_ threshold = 3 -
json_
document_ sparse_ children_ ratio = 2
JSON Objects |
---|
{“a1”: 1, “a2”: {“d1”: 1}, “a3”: {“c1”: 1}, “a4”: {“b1”: 1}} |
{“a1”: 2, “a2”: 1, “a4”: {“b2”: 1}} |
{“a1”: 3, “a2”: 1, “a4”: {“b3”: 1}} |
{“a1”: 4, “a2”: 1, ”a3”: {“c2”: 1}, “a4”: {“b4”: 1}} |
{“a1”: 5,, “a3”: {“c3”: 1} “a4”: {“b5”: 1}} |
The first pass at merging the keypaths will yield:

The second pass at merging the keypaths contains a type mismatch on a2.

The results of the third and fourth passes at merging yield:

In the final merge, the number of children for a4 exceeds the limit (4) set for json_
so it is marked as un-inferable.a3
exceeds the limit (3) set for json_
.
After the merging process, the schema tree with leaves are the inferred columns (also may be referred to as leaves, leaf columns, leaf paths, or key paths).
The keypaths will be a1, a2, a3, a4.
Encoding Nested Columns
Parquet uses the Dremel encoding
for nested columns with definition and repetition levels.
-
Definition levels specify how many optional fields in the path for the column are defined.
-
Repetition levels specify the repeated value of the field in the path.
-
SingleStore stores the leaf columns for the JSON schema along with their respective repetition and definition levels.
Each of these internal columns will correspond to the value, definition level and repetition level columns which are encoded in SingleStore encodings (SeekableString, etc).
The example tables will have these values for the internal columns:
a1 |
||
---|---|---|
Value |
Definition Level |
Repetition Level |
1 |
1 |
0 |
1 |
1 |
0 |
1 |
1 |
0 |
1 |
1 |
0 |
1 |
1 |
0 |
a2 |
||
---|---|---|
Value |
Definition Level |
Repetition Level |
‘{“d1”: 1}’ |
1 |
0 |
1 |
1 |
0 |
1 |
1 |
0 |
1 |
1 |
0 |
- |
0 |
0 |
a3 |
||
---|---|---|
Value |
Definition Level |
Repetition Level |
‘{“c1”: 1}’ |
1 |
0 |
- |
0 |
0 |
- |
0 |
0 |
‘{“c2”: 1}’ |
1 |
0 |
‘{“c3”: 1}’ |
1 |
0 |
a4 |
||
---|---|---|
Value |
Definition Level |
Repetition Level |
‘{“b1”: 1}’ |
1 |
0 |
‘{“b2”: 1}’ |
1 |
0 |
‘{“b3”: 1}’ |
1 |
0 |
‘{“b4”: 1}’ |
1 |
0 |
‘{“b5”: 1}’ |
1 |
0 |
Related Topics
Last modified: September 11, 2025