Using JSON

About JSON in SingleStore

SingleStore exposes a Javascript Object Notation (JSON) column type that implements the JSON standard.

You can define columns in SingleStore tables using the JSON Type. Analytics on these JSON columns is very efficient as SingleStore automatically columnarizes JSON data. A schema is inferred from JSON keys and the data is split into columns by key path and stored in an encoded Parquet-like format.

The JSON data is stored as if you had created a schema with separate columns for every field. As a result, queries on JSON columns read only the parts of a JSON object that are relevant to a query and therefore have excellent performance.

SingleStore provides a set of JSON functions for extracting, searching, analyzing, and modifying JSON data, including:

JSON_EXTRACT_<type>, to extract values out of JSON documents at specified keypaths.
Shorthand syntax, :: operators, to extract values out of JSON documents. The :: operators are convenient aliases for the JSON_EXTRACT_<type> functions and follow the same rules.
- JSON shorthand syntax is a path that uses :: as a separator.
JSON_MATCH_ANY, to check for the existence of values in a JSON document or array based on a path and a filter.
JSON_TO_ARRAY, to convert a JSON array to a SingleStore array.
- JSON_TO_ARRAY can be used in combination with TABLE to operate on values of a JSON array as SQL rows. This functionality is similar to UNNEST in other database systems.

JSON columns can be searched using SingleStore's Java Lucene-compatible full-text search. In addition, JSON columns can be indexed using computed columns; refer to Indexing Data in JSON Columns for more information.

SingleStore has a native BSON data type, plus SingleStore Kai, a MongoDB®-compatible API. The BSON data is stored in columns just like JSON data, and can be indexed with the Multi-Value Hash Index (BSON) and computed columns.

An alternative to using the JSON type is to map JSON fields to individual columns and use SQL queries to access the JSON data. Refer to Load JSON Files with LOAD DATA for information about how to map JSON fields to SingleStore columns during the loading process.

JSON Data Encodings

The default encoding for JSON columns is SeekableJson. As described above, this encoding is very efficient as it columnarizes the JSON. A schema is inferred from JSON keys and the data is split into columns by keypath and stored in an encoded Parquet-like format. Refer to Columnstore Seekability using JSON for more information.

JSON columns may also be encoded with the string encodings SeekableLZ4 and SeekableStringRunLength. These encodings compress data, but do not support efficient searches or extraction. SingleStore recommends using SeekableLZ4 and SeekableStringRunLength only for columns on which JSON values will be extracted as a whole.

Note: The information on this page is intended for use with JSON columns created with SeekableJson encoding or its precursor JSONParquet. Using functions described on this page on JSON columns with a string encoding will result in poor performance.

Examples

The following table is used in the examples. Note the printings array is intended to indicate the number of copies of the book printed in each printing; printings data is in millions of books and is not accurate.

SQL

CREATE TABLE books_json (id INT, books JSON);

INSERT INTO books_json VALUES
(1, '{
       "title": "Onyx Storm",
       "author": "Rebecca Yarros",
       "details": {
         "publisher": "Entangled:Red Tower Books",
         "numpages": 544,
         "publication date": "January 21, 2025",
         "printings": [2,1.3],
         "series": "The Empyrean"
       }
     }'
),
(2, '{
       "title": "The Maid",
       "author": "Nita Prose",
       "details": {
         "publisher": "Ballantine Books",
         "numpages": 385,
         "publication date": "January 4, 2022",
         "printings": [0.5,0.75,1.2]
       }
     }'
),
(3, '{
       "title": "The Last Letter",
       "author": "Rebecca Yarros",
       "details": {
         "publisher": "Entangled:Amara",
         "numpages": 432,
         "publication date": "February 26, 2019",
         "printings": [0.25,0.5,0.5]
       }
     }'
);

Example 1: Extract Values from JSON Using the `::` Operators

The ::, ::$, and ::% operators can be used to extract fields, strings, and SQL doubles from JSON documents. In the example below:

books::title extracts the title field,
books::$title extracts the title field as a SQL string,
books::details::%numpages extracts the numpages field as a double, and
books::details::printings::%`0` extracts the 0th element of the printings array.

Backticks (`) are required around numeric keys, as shown with `0` above

Refer to Using the ::$ and ::% Operators for details.

SQL

SELECT id,
       books::title AS title,
       books::$title AS title_string,
       books::details::%numpages AS numpages,
       books::details::printings::%`0` AS first_printing
FROM books_json
ORDER BY id;

+------+-------------------+-----------------+----------+----------------+
| id   | title             | title_string    | numpages | first_printing |
+------+-------------------+-----------------+----------+----------------+
|    1 | "Onyx Storm"      | Onyx Storm      |      544 |              2 |
|    2 | "The Maid"        | The Maid        |      385 |            0.5 |
|    3 | "The Last Letter" | The Last Letter |      432 |           0.25 |
+------+-------------------+-----------------+----------+----------------+

Example 2: Extract Values from JSON using `JSON_EXTRACT_`

The JSON_EXTRACT_ functions can be used to extract values from a JSON document in addition to the path expression syntax shown in Example 1. The JSON_EXTRACT_ functions can be used when you want to use variables or expressions in the keypath. The keypath in JSON_EXTRACT_ functions is a comma-separated list of object keys or zero-indexed array positions.

Below is a query similar to the query in Example 1, expressed using JSON_EXTRACT_ functions.

SQL

SELECT id,
      JSON_EXTRACT_JSON(books,'title') AS title,
      JSON_EXTRACT_STRING(books,'title') AS title_string,
      JSON_EXTRACT_BIGINT(books,'details','numpages') AS numpages,
      JSON_EXTRACT_DOUBLE(books,'details','printings',1-1) AS first_printing
FROM books_json
ORDER BY id;

+------+-------------------+-----------------+----------+----------------+
| id   | title             | title_string    | numpages | first_printing |
+------+-------------------+-----------------+----------+----------------+
|    1 | "Onyx Storm"      | Onyx Storm      |      544 |              2 |
|    2 | "The Maid"        | The Maid        |      385 |            0.5 |
|    3 | "The Last Letter" | The Last Letter |      432 |           0.25 |
+------+-------------------+-----------------+----------+----------------+

Note that the expression 1-1 is used to extract the value in position 0 in the printings array. Such expressions are supported in JSON_EXTRACT_ functions, but not when using the :: operators.

A JSON_EXTRACT_ on a nested key will only scan the column for that key, not the entire JSON document. For example, JSON_EXTRACT_JSON(jsondata,'details','title') AS title, will only scan the column for the title key.

Example 3: Find Existence of Values and Paths using JSON_MATCH_ANY

The JSON_MATCH_ANY and JSON_MATCH_ANY_EXISTS functions can be used to find the existence of values matching predicates and paths in JSON documents.

The JSON_MATCH_ANY function returns true if a value exists in the JSON at the filter path for which the filter predicate evaluates to true. The JSON_MATCH_ANY_EXISTS function returns true if there is a value (possibly null) in the JSON at the filter path.

The following is an example of using JSON_MATCH_ANY to find books that are part of The Empyrean series.

SQL

SELECT id, books::$title AS title, books::details::$series AS series
FROM books_json
WHERE JSON_MATCH_ANY(books::?details.series, MATCH_PARAM_STRING_STRICT() = "The Empyrean");

+------+------------+--------------+
| id   | title      | series       |
+------+------------+--------------+
|    1 | Onyx Storm | The Empyrean |
+------+------------+--------------+

Example 4: Aggregate Elements in a JSON Array Value using REDUCE

The REDUCE function can be used to aggregate elements in a JSON array value. SingleStore recommends using the REDUCE function when aggregating elements within a JSON array in a row.

Refer to Example 5 for an example of aggregating JSON array values across multiple rows which takes advantage of JSON Array Performance Enhancements.

The REDUCE function applies an expression to each element of an array and returns a single value. In the following example, REDUCE is used to sum the elements of the printings array for each row. The result is the total number of copies of each book that have been printed.

SQL

SELECT books::$title,
  REDUCE(
     0 :> double,
     JSON_TO_ARRAY(books_json.books::details::printings),
     REDUCE_ACC() + REDUCE_VALUE()
   ) AS total_printed_by_book
FROM books_json;

+-----------------+-----------------------+
| books::$title   | total_printed_by_book |
+-----------------+-----------------------+
| The Maid        |                  2.45 |
| The Last Letter |                  1.25 |
| Onyx Storm      |                   3.3 |
+-----------------+-----------------------+

In this example:

The JSON_TO_ARRAY function converts the array in the JSON field jsondata::books::details::printings to a SQL array.
The REDUCE function takes as input
1. An initial value: 0:>double.
2. A SQL array: the result of JSON_TO_ARRAY.
3. An accumulator expression: (REDUCE_ACC() + REDUCE_VALUE()), which specifies that the array elements should be summed.

Example 5: Use the `::` Operators in UPDATE Statements

You can use the :: operators for JSON keypaths in UPDATE queries. The :: operators are simply convenient aliases for the JSON_SET_<type> functions and follow the same rules.

The following query updates the number of pages in the book, Onyx Storm.

SQL

UPDATE books_json
SET books::details::%numpages = 545
WHERE books::$title = "Onyx Storm";

To add a printing for the book Onyx Storm, use JSON_ARRAY_PUSH_<type> as follows.

SQL

UPDATE books_json
SET books::details::printings = JSON_ARRAY_PUSH_DOUBLE(books::details::printings, 1.5)
WHERE books::$title = "Onyx Storm";

To add a series to the book The Maid.

SQL

UPDATE books_json
SET books::details::$series = 'Molly the Maid'
WHERE books::$title = "The Maid";

View the resulting updates.

SQL

SELECT JSON_PRETTY(books)
FROM books_json
WHERE books::$title = "The Maid";

+---------------------------------------------+
| {
  "author": "Nita Prose",
  "details": {
    "numpages": 385,
    "printings": [
      0.5,
      0.75,
      1.2
    ],
    "publication date": "January 4, 2022",
    "publisher": "Ballantine Books",
    "series": "Molly the Maid"
  },
  "title": "The Maid"
}                       |
+---------------------------------------------+

Example 6: Convert an Array (List) of JSON Objects to a Table

Use the JSON_TO_ARRAY function and the TABLE built-in function to convert a list of JSON objects to a table.

Create a table with a column to hold a JSON value and insert data into that table.

SQL

CREATE TABLE json_list_example (json_list JSON);

INSERT INTO json_list_example values(
'[
      {
       "title": "Onyx Storm",
       "author": "Rebecca Yarros",
 	 "numpages": 544
},
     {
      "title": "The Last Letter",
      "author": "Rebecca Yarros",
      "numpages": 432
     }
]');

In the following query, the syntax JOIN TABLE(JSON_TO_ARRAY(json_list)) converts the objects in the json_list column to a table, that is, each object in the JSON list is turned into a row in the table.

The JSON_TO_ARRAY function converts the JSON array to a SingleStore ARRAY.
The TABLE function converts a SingleStore ARRAY to a column named table_col that contains one row for each array entry.
The JOIN clause is required when using the TABLE function on an existing SingleStore table.

SQL

SELECT json_list_as_table.table_col AS books_col
FROM json_list_example JOIN TABLE(JSON_TO_ARRAY(json_list)) AS json_list_as_table;

+----------------------------------------------------------------------+
| books_col                                                            |
+----------------------------------------------------------------------+
| {"author":"Rebecca Yarros","numpages":544,"title":"Onyx Storm"}      |
| {"author":"Rebecca Yarros","numpages":432,"title":"The Last Letter"} |
+----------------------------------------------------------------------+

The JSON_AGG function can be used to combine the rows in the table into a single row and re-create the original JSON array.

SQL

WITH books_table AS (
	SELECT json_list_as_table.table_col AS books_col
            FROM json_list_example JOIN TABLE(JSON_TO_ARRAY(json_list)) AS json_list_as_table
)
SELECT JSON_AGG(books_col) FROM books_table;

+----------------------------------------------------------------------------------------------------------------------------------------+
| JSON_AGG(books_col)                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------+
| [{"author":"Rebecca Yarros","numpages":544,"title":"Onyx Storm"},{"author":"Rebecca Yarros","numpages":432,"title":"The Last Letter"}] |
+----------------------------------------------------------------------------------------------------------------------------------------+

Example 7: Aggregate Elements in JSON Arrays Across Rows using TABLE and JSON_TO_ARRAY

Use the TABLE and JSON_TO_ARRAY functions to aggregate values from JSON arrays across rows. Aggregations over results of the JOIN TABLE(JSON_TO_ARRAY()) operation are optimized to reduce the amount of data read. Queries will retrieve only the parts of the JSON column relevant to the query instead of fetching the entire JSON object from disk. Refer to JSON Array Performance for details.

The following example uses the TABLE and JSON_TO_ARRAY functions to sum the elements in the printings array across all rows for each author.

SQL

SELECT books::$author AS author, SUM(JSON_EXTRACT_DOUBLE(i.table_col)) AS total_printed_by_author
FROM books_json
JOIN TABLE(
JSON_TO_ARRAY(books_json.books::details::printings)) AS i
GROUP BY author;

+----------------+-------------------------+
| author         | total_printed_by_author |
+----------------+-------------------------+
| Rebecca Yarros |                    4.55 |
| Nita Prose     |                    2.45 |
+----------------+-------------------------+

In this example:

The JSON_TO_ARRAY function converts the array in the JSON field books_json.books::details::printings to a SQL array.
The TABLE function converts this array into a table.
The JOIN clause FROM books_json JOIN TABLE(JSON_TO_ARRAY(books_json.books::details::printings)) AS i creates a table that contains:
1. For each row in the books_json table:
  1. One row for each element in the printings array in that row.
  2. This row contains the values from the columns in the books_json table and a column, table_col, which contains the array value.
Finally, the SELECT and GROUP BY aggregate over the table created in step 3.

In this example, which aggregates over multiple JSON documents, SingleStore only reads the author and printings values to compute the result. Additional JSON values, such as title, publisher, and numpages, will not be read making the query execution very efficient.

Note: The JSON_EXTRACT_DOUBLE() function in the SELECT clause is necessary to use the JSON_TO_ARRAY optimization. If that function is removed, the query will execute, but will not use the optimization.

Managing Collections of Metadata

JSON is useful for managing a collection of diverse data, represented as name-value pairs, that might otherwise be cumbersome to refactor into a formalized key-value table, or that might be stored in a table that is sparsely populated. For example, suppose an organization had an asset management application using SingleStore to track all the information about its physical assets. The asset data is diverse—what’s relevant for a desk differs from what’s important for a server machine or a company car. All assets might have common attributes, such as asset tag ID, asset type, asset name, and description. Each type of asset might have unique attributes, such as size and weight dimensions, hostname and IP address, or gas mileage.

Instead of creating a highly granular table to manage all data as key-value pairs, this organization could simply create a SingleStore table using a JSON column to efficiently manage the unique attribute data. With this design:

Each asset gets a row in the table.
Attributes that are common to all assets have their own column in the table. These columns allow you to query on common features and quickly narrow down the final result set as much as possible (for example, filtering by asset type).
The various, remaining attributes associated with each asset are stored in a JSON column (which might be named something like property_bag). For example, the JSON column for an office desk asset could include JSON data such as size, weight, and number of drawers. The JSON data for a server machine could include rack location, number of cores, and MAC address.

DDL: Defining Tables with JSON Columns

Any SingleStore table can contain one or more columns of data type JSON. A JSON column can optionally be suffixed with NOT NULL.

Comparing JSON and LONGTEXT Columns

A JSON column is analogous to a LONGTEXT column in the following ways:

JSON columns can store arbitrarily large JSON values in a normalized text representation.
JSON columns have the same storage requirement, as if the JSON value were stored in a text column.

The primary difference is that JSON data is stored in a normalized format, which makes many operations faster than if the data were stored manually in a text column. The following is an example of non-normalized data, which is valid JSON but is relatively difficult to parse:

JSON

'{ "b\u000a": 1,"a": 2 ,"a":3 } '

Normalized data, on the other hand, is easier to parse because duplicate keys are merged, the data is sorted by keys, and extraneous whitespace is removed.

JSON

'{"a":3,"b\n":1}'

SingleStore recommends storing JSON data in JSON columns and not in LONGTEXT columns. JSON columns validate the JSON values and provide Unicode Support. If storage space and memory use is a concern, and search and extraction are not required on the column, use a string encoding on the column. Refer to Columnstore Seekability using JSON for more information.

Defining JSON Columns

Defining a JSON column in a SingleStore table is as simple as specifying the JSON data type in the CREATE TABLE command:

SQL

CREATE TABLE assets (
   tag_id BIGINT PRIMARY KEY,
   name TEXT NOT NULL,
   description TEXT,
   properties JSON NOT NULL);

JSON Collation

The collation for a JSON column can be set explicitly using a COLLATE clause. When the collation is not set explicitly, the collation is controlled by the json_collation engine variable.

Refer to Sync Variables Lists, Character Set and Collation Override, Unicode Support, and Character Encodings and Collation for more information.

Indexing Data in JSON Columns

JSON columns are not indexed directly - they are indexed using computed columns. For the fastest performance, you should not use JSON built-ins or :: notation in your filters. Instead, create a computed column that includes the JSON column in the computation, and then use the computed column for the index. In this way, the index gets updated only when the relevant JSON data is updated in a row.

SQL

CREATE TABLE assets (
   tag_id BIGINT PRIMARY KEY,
   name TEXT NOT NULL,
   description TEXT,
   properties JSON NOT NULL,
   weight AS properties::%weight PERSISTED DOUBLE,
   license_plate AS properties::$license_plate PERSISTED LONGTEXT,
   KEY(license_plate), KEY(weight));

JSON computed columns that are indexed will be utilized by the optimizer more efficiently. Queries that use indexed computed columns as filters or sorts, will perform faster by avoiding expression evaluation, and by seeking into or searching the indexes rather than scanning tables. In the following examples, we will use the :: notation in the filtering and sorting clauses to illustrate how the optimizer matches the computed columns.

SQL

EXPLAIN SELECT * FROM assets WHERE properties::$license_plate = "VGB116";

+------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                          |
+------------------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0                                                                             |
| Project [assets.tag_id, assets.name, assets.description, assets.properties, assets.weight, assets.license_plate] |
| ColumnStoreFilter [assets.license_plate = 'VGB116' index]                                                       |
| ColumnStoreScan test1.assets, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore      |
+------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

SQL

EXPLAIN SELECT * FROM assets ORDER BY properties::%weight;

+------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                          |
+------------------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.weight] partitions:all alias:remote_0                                                      |
| Project [assets.tag_id, assets.name, assets.description, assets.properties, assets.weight, assets.license_plate] |
| Sort [assets.weight]                                                                                             |
| ColumnStoreScan test1.assets, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore      |
+------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

DML: Accessing Data in JSON Columns

This section describes how to insert and update data in a SingleStore table with one or more JSON columns.

Inserting Data into a JSON Column

When inserting a row in a table, specifying JSON data is straightforward. For example: Given a table t defined as CREATE TABLE test_table(col_a TEXT, col_b JSON);, you can insert a row into test_table as follows:

SQL

INSERT INTO test_table(col_a,col_b) VALUES ('hello','{"x":"goodbye","y":"goodnight"}');

Columnstore Tables Having JSON Columns with Null Values or Empty Arrays

By default, SingleStore preserves columnstore JSON NULL values and empty arrays. To change this behavior and disable this setting, set the preserve_original_colstore_json global variable to OFF. This flag preserves the original columnstore JSON for any new data that is loaded; it does not update the existing data. By default, this variable is set to AUTO (same as ON).

Behavior When `preserve_original_colstore_json` is Set to `OFF`

When you store a JSON column in a columnstore table (and preserve_original_colstore_json is set to OFF), then null values and empty arrays in the JSON object are handled as follows when the object is written to the columnstore:

Name/value pairs with the value NULL are normally removed from the JSON object.
Name/value pairs containing empty arrays are normally removed from the JSON object.
If the JSON object has only the value NULL or [], the value is replaced with NULL.

Example commands you can use to store data are INSERT, UPDATE, and LOAD DATA.

An example INSERT scenario follows. Consider a table that is defined as:

SQL

CREATE TABLE json_empty_values_table(a INT, b JSON, SORT KEY (a));

Insert five rows into the table:

SQL

INSERT INTO json_empty_values_table VALUES (1, '{"v":null}');
INSERT INTO json_empty_values_table VALUES (2, '{"w":[]}');
INSERT INTO json_empty_values_table VALUES (3, '{"x":"foo","y":null,"z":[]}');
INSERT INTO json_empty_values_table VALUES (4, 'null');
INSERT INTO json_empty_values_table VALUES (5, '[]');

Manually flush the inserted data to the columnstore:

SQL

OPTIMIZE TABLE json_empty_values_table FLUSH;

Query the table:

SQL

SELECT * FROM json_empty_values_table ORDER BY a;

+------+-----------------------------+
| a    | b                           |
+------+-----------------------------+
|    1 | {"v":null}                  |
|    2 | {"w":[]}                    |
|    3 | {"x":"foo","y":null,"z":[]} |
|    4 | null                        |
|    5 | []                          |
+------+-----------------------------+

Accessing Fields in a JSON Object

To access a field of a JSON object stored in a column, use the name of the column suffixed with ::keyName. For example, if column data contains {"bits":[true,false]}, then the expression data::bits evaluates to the JSON value [true,false].

Converting a JSON String Value into a SQL String

The :: operator, when applied to a string column of a JSON object, returns the value of the string enclosed in quotes. Use the ::$keyname operator to return the value of the string without the enclosing quotes.

For example, consider a table TestJSON that contains a JSON column data.

SQL

INSERT INTO TestJSON VALUES ('{"first":"hello"}');

Retrieve the value of the data column using :: and ::$ operators:

SQL

SELECT data::first, data::$first FROM TestJSON;

+-------------+--------------+
| data::first | data::$first |
+-------------+--------------+
| "hello"     | hello        |
+-------------+--------------+

In this example, the data JSON column contains two string values.

SQL

INSERT INTO TestJSON VALUES ('{"first":"hello", "second":"world"}');

To retrieve the concatenated SQL string, use the ::$ operator:

SQL

SELECT CONCAT(data::$first, ' ', data::$second) FROM TestJSON;

+------------------------------------------+
| CONCAT(data::$first, ' ', data::$second) |
+------------------------------------------+
| hello world                              |
+------------------------------------------+

Converting a JSON Number or Boolean Value into a SQL DOUBLE

To transparently convert a JSON number or Boolean value into a SQL DOUBLE, use the name of the column suffixed with ::%keyname. For example, if column data contains {"valid":true,"value":3.14}, then data::%valid is 1 and data::%value is 3.14.

Using the ::$ and ::% Operators

If either the ::$ or ::% access operator is used on a JSON field that is not of the appropriate type, the JSON result is quietly converted to a string or a double based on the rules defined in JSON_EXTRACT_<type>.

The :: access operators are simply convenient aliases for the JSON_EXTRACT_<type> built-in functions, and they follow all the same rules. However, the :: operators do not work on the outputs of User Defined Functions (UDFs), such as SELECT udf_name(1)::key, and expressions. The JSON_EXTRACT_<type> functions must be used instead.

In addition, The :: operators require that numeric keys be specified with backticks (`).

The following SELECT statement returns the 2nd (zero-indexed) array element of the array {"a":[1,2,3,4]}. A syntax error will be returned if the backticks (`) are not included.

SQL

SELECT json, json::a::`2`
FROM (SELECT '{"a":[1,2,3,4]}' AS json) sub;

+-----------------+--------------+
| json            | json::a::`2` |
+-----------------+--------------+
| {"a":[1,2,3,4]} | 3            |
+-----------------+--------------+

In addition, backticks can be used for non-numeric key names and array indexes, which is useful if the key name or index contains a space.

The following query is valid and will return the same output as the query above.

SQL

SELECT json, json::`a`::`2`
FROM (SELECT '{"a":[1,2,3,4]}' AS json) sub;

+-----------------+----------------+
| json            | json::`a`::`2` |
+-----------------+----------------+
| {"a":[1,2,3,4]} | 3              |
+-----------------+----------------+

The following is the same logical query except the array index 2 has been replaced by the expression 1+1 and the JSON_EXTRACT_JSON function is used.

While this example uses a simple expression (1+1), more functions such as DAYOFWEEK and more complex expressions can be used.

Refer to JSON_EXTRACT_<type> for details.

SQL

SELECT json, JSON_EXTRACT_JSON(json, "a", 1+1)
FROM (SELECT '{"a":[1,2,3,4]}' AS json) sub;

+-----------------+--------------+
| json            | json::a::`2` |
+-----------------+--------------+
| {"a":[1,2,3,4]} | 3            |
+-----------------+--------------+

Accessing Nested JSON Objects

To access nested JSON objects, chain the colon-colon operator to form a keypath. For example, data::address::street means the street field of the address field of the data column.

Note

If one of the keys in the keypath is not present in the nested object, then the entire colon-colon expression yields SQL NULL.

Working with Nested Arrays in a JSON Column

Consider a table defined as:

SQL

CREATE TABLE json_tab (`id` INT(11) DEFAULT NULL,`jsondata` JSON COLLATE utf8_bin);

Insert values as follows:

SQL

INSERT INTO json_tab VALUES
( 8765 ,' {"city":"SFO","sports_teams":[{"sport_name":"football","teams":  [{"club_name":"Raiders"},{"club_name":"49ers"}]},
{"sport_name":"baseball","teams" : [{"club_name":"As"},{"club_name":"SF Giants"}]}]}') ;

INSERT INTO json_tab VALUES
( 9876,'{"city":"NY","sports_teams" : [{ "sport_name":"football","teams" : [{ "club_name":"Jets"},{"club_name":"Giants"}]},
{"sport_name":"baseball","teams" : [ {"club_name":"Mets"},{"club_name":"Yankees"}]},
{"sport_name":"basketball","teams" : [{"club_name":"Nets"},{"club_name":"Knicks"}]}]}');

Query the table:

SQL

WITH t AS(
SELECT id, jsondata::city city , table_col AS sports_clubs FROM json_tab JOIN TABLE(JSON_TO_ARRAY(jsondata::sports_teams))),

t1 AS(
SELECT t.id, t.city, t.sports_clubs::sport_name sport, table_col AS clubs FROM t JOIN TABLE(JSON_TO_ARRAY(t.sports_clubs::teams)))

SELECT t1.id, t1.city,t1.sport,t1.clubs::club_name club_name FROM t1;

+------+-------+--------------+-------------+
| id   | city  | sport        | club_name   |
+------+-------+--------------+-------------+
| 9876 | "NY"  | "football"   | "Jets"      |
| 9876 | "NY"  | "football"   | "Giants"    |
| 9876 | "NY"  | "baseball"   | "Mets"      |
| 9876 | "NY"  | "baseball"   | "Yankees"   |
| 9876 | "NY"  | "basketball" | "Nets"      |
| 9876 | "NY"  | "basketball" | "Knicks"    |
| 8765 | "SFO" | "football"   | "Raiders"   |
| 8765 | "SFO" | "football"   | "49ers"     |
| 8765 | "SFO" | "baseball"   | "As"        |
| 8765 | "SFO" | "baseball"   | "SF Giants" |
+------+-------+--------------+-------------+

You can also further filter the results by applying conditions. For example, to find the city with the "Yankees" club, run the following query:

SQL

WITH t AS
(SELECT id, jsondata::city city , table_col AS sports_clubs FROM json_tab JOIN TABLE(JSON_TO_ARRAY(jsondata::sports_teams))),

t1 AS
(SELECT t.id, t.city, t.sports_clubs::sport_name sport, table_col AS clubs FROM t JOIN TABLE(JSON_TO_ARRAY(t.sports_clubs::teams)))

SELECT t1.id, t1.city,t1.sport,t1.clubs::club_name club_name FROM t1 WHERE t1.clubs::$club_name = 'Yankees';

+------+------+------------+-----------+
| id   | city | sport      | club_name |
+------+------+------------+-----------+
| 9876 | "NY" | "baseball" | "Yankees" |
+------+------+------------+-----------+

Nested JSON Ingest

Working with a nested JSON ingest requires an additional escape as the value being inserted is interpreted as a SQL string first. Therefore, the backslash (\) symbol needs an escape in addition to the JSON escape (‘\’):

SQL

CREATE TABLE test_json(col_a json);

INSERT INTO test_json VALUES ('{"addParams": "{\\"Emp_Id\\":\\"1487\\",
  \\"Emp_LastName\\":\\"Stephens\\",\\"Emp_FirstName\\":\\"Mark\\",\\"Dept\\":\\"Support\\"}"}');

SQL

SELECT * FROM test_json;

+------------------------------------------------------------------------------------+
| col_a                                                                              |
+------------------------------------------------------------------------------------+
| {"addParams":"{\"Emp_Id\":\"1487\",                                                |
|   \"Emp_LastName\":\"Stephens\",\"Emp_FirstName\":\"Mark\",\"Dept\":\"Support\"}"} |
+------------------------------------------------------------------------------------+

Using Colon-Colon Notation in UPDATE Queries

You can use the colon-colon notation for JSON keypaths in UPDATE queries. For example, the following two UPDATE queries perform the same operation:

SQL

UPDATE users SET userdata::name::$first = 'Alex';
UPDATE users SET userdata = JSON_SET_STRING(userdata, 'name', 'first', 'Alex');

In fact, these access operators are simply convenient aliases for the JSON_SET_<type> built-in function (see JSON_SET_<type>), and they follow all the same rules.

Field Name Syntax for JSON Access and UPDATE Queries

In both the JSON access and update contexts, each field name must either:

follow the syntax for a valid SQL identifier, or
be escaped with backticks in the same manner as a SQL identifier

For example, the following two SELECT queries perform the same operation:

SQL

SELECT ticker_symbol FROM stocks WHERE statistics::%`P/E` > 1.5;
SELECT ticker_symbol FROM stocks WHERE JSON_EXTRACT_DOUBLE(statistics, 'P/E') > 1.5;

When using the JSON_EXTRACT_<type> form of the query (see JSON_EXTRACT_<type> ), there is no constraint on the contents of the keystring. A JSON object can contain any string as a key, even "", or a string with "embedded\u0000nulls". For more information on extracting JSON data, see JSON LOAD DATA.

JSON Performance

The following example will demonstrate how to improve JSON seekability by utilizing sub-segments.

Begin by turning off or disabling the variable:

SQL

SET GLOBAL use_seekable_json = OFF;

Create a database and change to the newly created database..

SQL

CREATE DATABASE orders;

USE orders;

Create a table with id as an integer and set as the primary key

SQL

CREATE TABLE orders(id INT PRIMARY KEY not null, order_doc json, sort KEY(id));

Create a stored procedure to generate documents until it has reached a specific number nearest to the power of two > = n.

SQL

DELIMITER //

CREATE or REPLACE PROCEDURE gen_orders(n INT) AS
DECLARE
  i INT;
  c INT;
  m INT;
BEGIN
  DELETE FROM orders;
  INSERT orders VALUES (1,
     '{ "id" : 1,
        "time" : "2022-11-17 23:03:54",
        "lineitems" : [{ "line_num" : 1,
                         "SKU" : 688968,
                         "description" : "Apple iPhone 14 Plus",
                         "qty" : 1},
                        { "line_num" : 2,
                         "SKU" : 6514052,
                         "description" : "Energizer - MAX AA Batteries (8 Pack), Double A Alkaline Batteries",
                         "qty" : 3},
                        { "line_num" : 3,
                         "SKU" : 6457697,
                         "description" : "Star Wars - Galactic Snackin Grogu",
                         "qty" : 1}] }'
        );

  i = 1;
  WHILE i < n LOOP
    SELECT MAX(id) INTO m FROM orders;
    INSERT INTO orders
    SELECT id + m, new_order_doc(order_doc, id + m)
    FROM orders;

    SELECT COUNT(*) INTO c FROM orders;
    i += c;
 END LOOP;
END
//
DELIMITER ;

Create a function that will change the input order_doc into a new one with random data, and a new id.

SQL

DELIMITER //
CREATE or REPLACE FUNCTION new_order_doc(order_doc json, new_id INT) RETURNS json
AS
DECLARE
  j json;
  lineitems json;
  l1 json;
  l2 json;
  l3 json;
BEGIN
  j = json_set_double(order_doc, "id", new_id);
  j = json_set_string(j, "time", now());
  lineitems = json_extract_json(j, "lineitems");
  l1 = json_extract_json(lineitems, 0);
  l2 = json_extract_json(lineitems, 1);
  l3 = json_extract_json(lineitems, 2);
  lineitems = json_array_push_json('[]', mutate_lineitem(l1));
  lineitems = json_array_push_json(lineitems, mutate_lineitem(l2));
  lineitems = json_array_push_json(lineitems, mutate_lineitem(l3));
  j = json_set_json(j, "lineitems", lineitems);
  RETURN j;
END
//
DELIMITER ;

Create a function to use the mutate_lineitem.

SQL

DELIMITER //
CREATE OR REPLACE FUNCTION mutate_lineitem(item json) RETURNS json
AS
DECLARE
  j json;
BEGIN
  j = json_set_double(item, "SKU", floor(rand()*10*1000*1000));
  j = json_set_string(j, "description", uuid());
  j = json_set_double(j, "qty", ceil(3*rand()));
  RETURN j;
END
//
DELIMITER ;

Generate eight million rows of test data.

SQL

CALL gen_orders(8*1024*1024);

Create a stored procedure that will measure the average lookup time.

SQL

DELIMITER //

CREATE OR REPLACE PROCEDURE get_avg_time(n INT) AS
DECLARE
  _id INT;
  _od json;
  _oid INT;
  m INT;
  st DATETIME(6);
  et DATETIME(6);
BEGIN
  SELECT MAX(id) INTO m FROM orders;
  st = now(6);
  FOR i in 1..n LOOP
    _oid = CEILING(m*RAND());

    SELECT id, order_doc
    INTO _id, _od
    FROM orders
    WHERE id = _oid;
  END LOOP;
  et = now(6);

  echo SELECT (timestampdiff(MICROSECOND, st, et)/1000000.0)/n AS avg_time;
END
//
DELIMITER ;

Before obtaining the average query time make sure the table has been optimized and all the data is in columnstore format.

SQL

OPTIMIZE TABLE orders FULL;

Obtain average query time.

SQL

CALL get_avg_time(100);

+------------+
| avg_time   |
+------------+
| 1.73598285 |
+------------+
1 row in set (2 min 53.61 sec)

To measure the new query time, enable to variable.

SQL

SET GLOBAL use_seekable_json = ON

Create a second table using the same format as the first table.

SQL

CREATE TABLE orders2 LIKE orders;

Use the same data from the first table and make sure to optimize the new table.

SQL

INSERT INTO orders2 SELECT * FROM orders;

SQL

OPTIMIZE TABLE orders2 FULL;

Update the stored procedure get_avg_time by changing orders to orders2 in both places.

SQL

DELIMITER //

CREATE OR REPLACE PROCEDURE get_avg_time(n INT) AS
DECLARE
  _id INT;
  _od json;
  _oid INT;
  m INT;
  st DATETIME(6);
  et DATETIME(6);
BEGIN
  SELECT MAX(id) INTO m FROM orders2;
  st = now(6);
  FOR i in 1..n LOOP
    _oid = CEILING(m*RAND());

    SELECT id, order_doc
    INTO _id, _od
    FROM orders2
    WHERE id = _oid;
  END LOOP;
  et = now(6);

  echo SELECT (timestampdiff(MICROSECOND, st, et)/1000000.0)/n AS avg_time;
END
//
DELIMITER ;

Run the get_avg_time stored procedure again with the changed table name.

SQL

CALL get_avg_time(100);

+------------+
| avg_time   |
+------------+
| 0.03216373 |
+------------+
1 row in set (3.47 sec)

JSON Array Performance

Aggregations over array objects using the JOIN TABLE(JSON_TO_ARRAY()) operation are optimized in SingleStore to run efficiently on large batches of rows in parallel. The optimization also reduces the amount of data read from the JSON column by retrieving only the necessary data instead of fetching the entire JSON object.

The JSON_TO_ARRAY optimization requires the column created by the TABLE function (table_col) be wrapped in a JSON_EXTRACT_<type>() function. A JSON_EXTRACT_<type>() function can be used to directly extract a value as shown in Example 7 in the Introduction, or the shorthand (::) syntax can be used when there are keys to extract, as is shown in the following examples.

The following example uses the TPC-H data. Load this TPC-H data into your cluster, and run the following queries to modify the data for the current use case:

SQL

ALTER TABLE orders ADD COLUMN lineitems_json JSON;

CREATE TABLE new_table (
 l_orderkey bigint(11) NOT NULL,
 lineitems JSON
);

INSERT INTO new_table
SELECT l_orderkey,
      JSON_AGG(
          JSON_BUILD_OBJECT(
              'l_partkey', l_partkey,
              'l_suppkey', l_suppkey,
              'l_linenumber', l_linenumber,
              'l_quantity', l_quantity,
              'l_extendedprice', l_extendedprice,
              'l_discount', l_discount,
              'l_tax', l_tax,
              'l_returnflag', l_returnflag,
              'l_linestatus', l_linestatus,
              'l_shipdate', l_shipdate,
              'l_commitdate', l_commitdate,
              'l_receiptdate', l_receiptdate,
              'l_shipinstruct', l_shipinstruct,
              'l_shipmode', l_shipmode,
              'l_comment', l_comment
          )
      ) as lineitems
FROM lineitem
GROUP BY l_orderkey;

UPDATE orders o
JOIN new_table t ON o.o_orderkey = t.l_orderkey
SET o.lineitems_json = t.lineitems;

Here are a few sample query structures that benefit from the optimization.

The following query calculates the total number of line items for each shipping mode by aggregating and grouping on the JSON array columns:

SQL

SELECT t.table_col::$l_shipmode, sum(t.table_col::%l_quantity) as quantity
FROM orders
JOIN TABLE(JSON_TO_ARRAY(lineitems_json)) t
GROUP BY t.table_col::$l_shipmode;

+--------------------------+-----------+
| t.table_col::$l_shipmode | quantity  |
+--------------------------+-----------+
| MAIL                     | 218541962 |
| FOB                      | 218469194 |
| RAIL                     | 218553230 |
| AIR                      | 218525124 |
| REG AIR                  | 218617001 |
| SHIP                     | 218576988 |
| TRUCK                    | 218454537 |
+--------------------------+-----------+

The following query calculates the total number of line items for each order priority:

SQL

SELECT o_orderpriority as priority, sum(t.table_col::$l_quantity) as quantity
FROM orders JOIN TABLE(JSON_TO_ARRAY(lineitems_json)) t
GROUP BY o_orderpriority;

+-----------------+-----------+
| priority        | quantity  |
+-----------------+-----------+
| 3-MEDIUM        | 305731841 |
| 5-LOW           | 305656541 |
| 2-HIGH          | 306112515 |
| 1-URGENT        | 306300507 |
| 4-NOT SPECIFIED | 305936632 |
+-----------------+-----------+

The following query performs an intricate aggregation operation over multiple fields:

SQL

SELECT
   t.table_col::$l_returnflag as r,
   t.table_col::$l_linestatus as s,
   sum(t.table_col::%l_quantity) as sum_qty,
   sum(t.table_col::%l_extendedprice) as sum_base_price,
   sum(t.table_col::%l_extendedprice * (1 - t.table_col::%l_discount)) as sum_disc_price,
   avg(t.table_col::%l_quantity) as avg_qty
FROM orders
JOIN TABLE(JSON_TO_ARRAY(lineitems_json)) t
GROUP by r, s

+---+---+-----------+--------------------+--------------------+--------------------+
| r | s | sum_qty   | sum_base_price     | sum_disc_price     | avg_qty            |
+---+---+-----------+--------------------+--------------------+--------------------+
| A | F | 377518399 |  566065727797.2633 | 537759104278.07465 | 25.500975103007097 |
| R | F | 377732830 |  566431054976.0034 |  538110922664.7574 |  25.50838478968014 |
| N | F |   9851614 | 14767438399.169971 | 14028805792.211407 | 25.522448302840946 |
| N | O | 764635193 | 1146548935600.9531 | 1089215873201.9531 | 25.498214140059787 |
+---+---+-----------+--------------------+--------------------+--------------------+

Limitation

The optimization is only applied to the first dimension of an array.

Implementation Considerations

This section describes some unique behaviors in SingleStore’s implementation of the JSON standard (RFC 4627).

Infinities and NaNs

SingleStore does not attempt to support entities such as the nan (not a number) entity, as in {"value":nan}. Although the JavaScript language supports nan and positive and negative infinities, neither the JSON standard nor SQL’s DOUBLE type provide any way to represent such non-finite values. If your application requires these special values, you might try using null, 0.0, or strings (such as "nan" and "inf") in their place.

Unicode Support

JSON strings can contain any of the 65536 Unicode characters in the Basic Multilingual Plane, either directly as themselves or escaped with JSON’s \uABCD notation. JSON normalization converts all such ASCII escape sequences into UTF-8 byte sequences, except for those characters that must remain \u-encoded to keep the string valid JSON.

Warning

Characters outside of the Basic Multilingual Plans in JSON strings are not supported with escaped notation and may result in incorrect results.

In SingleStore, a backslash (\) literal in a string must be escaped with a backslash. Therefore, pattern strings containing backslashes will have two backslash characters (\\).

Here is an example of how to use escaped notation for characters in the Basic Multilingual Plane:

SQL

SELECT '{"a":"\\u00F9"}' :> JSON;

+---------------------------+
| '{"a":"\\u00F9"}' :> JSON |
+---------------------------+
| {"a":"ù"}                 |
+---------------------------+

Like MySQL, SingleStore supports characters outside the Basic Multilingual Plane (characters whose codepoints are in the range 0x10000 to 0x10FFFF). This includes some uncommon Chinese characters and symbols such as emojis.

The json_collation engine variable must be set to utf8mb4_bin for these characters to work. Refer to Sync Variables Lists for more information.

Character Encodings and Collation

Collation provides a set of rules to use when comparing characters in a character set. By default, JSON columns use the collation specified in the json_collation engine variable (Sync Variables Lists). The collation of a JSON column affects the following:

Output of SELECT DISTINCT, ORDER BY, and other queries that compare entire values.
Identification of duplicate keys inside a single JSON object during normalization. For example, whether the string {"Schlüssel":1,"Schluessel":2} is normalized to {"Schluessel":2}.
Sorting of keys inside a single JSON object. For example, whether the string {"Schlüssel":1,"Schluff":2} is normalized to {"Schluff":2,"Schlüssel":1}.

The default collation generally provides the desired behavior. However, you can override the default at the table or column level or by using the json_collation engine variable.

Note

SingleStore recommends that you use the utf8mb4_bin collation for all JSON columns.

Table-level Override

In the following example, for the users table, both userdata and lastname use the table collation (which has been specified as utf8_unicode_ci).

SQL

CREATE TABLE users (
   uid INT AUTO_INCREMENT PRIMARY KEY,
   userdata JSON,
   lastname AS data::name::$last PERSISTED TEXT) COLLATE=utf8_unicode_ci;

Column-level Override

In the following example, for the orders table:

The data column uses utf8_unicode_ci.
The productdetails column uses utf8_bin.
Both the productname column and the comments column use utf8_general_ci, which is SingleStore’s default database collation

SQL

CREATE TABLE orders (
    oid INT AUTO_INCREMENT PRIMARY KEY,
    data JSON COLLATE utf8_unicode_ci,
    productname AS data::product::$name PERSISTED TEXT,
    productdetails AS data::product::$details PERSISTED TEXT COLLATE utf8_bin,
    comments VARCHAR(256));

Expression-level Override

The :> operator allows you to specify a collation for different expressions in a query.

SQL

:> text collate <colation_name>

The following example uses a binary collation (utf8_bin) for the first expression and a case-insensitive collation (utf8_general_ci) for the second expression.

SQL

SELECT *
FROM sets
WHERE sets.json_field::$x :> text collate utf8_bin = "string1"
AND sets.json_field::$y :> text collate utf8_general_ci = "string2";

Maximum JSON Value Size

Under the hood, JSON data is stored in LONGBLOB columns. While the DDL will allow you to specify a length of up to 4GB, there is an internal limit applied when assigning a value to a string or JSON field that caps the maximum size of a single value to max_allowed_packet. This is 100MB by default and can be set to up to 1GB.

Performance Impact on Parquet Encoding for JSON

When using seekable JSON/JSON Parquet encoding a major impact on performance is if a schema is dense or sparse.

A node in a tree is considered to be dense if it occurs in most JSON rows. A schema is said to be dense if most of the nodes in the entire schema are dense. Otherwise, the schema is considered to be sparse.

Consider the following JSON data table:

JSON Data (dense)
`{"a":1, "b":1}`
`{“a”:2, “c”:{“d”: 1}}`
`{“c”:{“d”:2, “e”:3}}`

The schema will infer the JSON rows as follows:

Using the image above the leaves of the tree become internal columns in the JSON encoding. This is an example of a dense schema.

a	b	c::d	c::e
1	1	NULL	NULL
2	NULL	1	NULL
NULL	NULL	2	3

Using the following JSON data table:

JSON Data (sparse)
`{“a”: 1}`
`{“b”: 1}`
`{“c”: 1}`
`{“d”: 1}`
`{“e”: 1}`

The JSON rows will be encoded as follows:

a	b	c	d	e
1	NULL	NULL	NULL	NULL
NULL	1	NULL	NULL	NULL
NULL	NULL	1	NULL	NULL
NULL	NULL	NULL	1	NULL
NULL	NULL	NULL	NULL	1

The preceding table represents a sparse schema caused by the NULLs not being part of the original JSON strings. This results in poor performance since the NULLs will need to be counted which increases the execution time and memory usage.

To prevent an overly sparse schema from being inferred, SingleStore uses a method where if the average of a key’s children is too low (<1%) in relation to the key itself, the key is stored as a string without inferring its children.

On this page

About JSON in SingleStore

JSON Data Encodings

Examples

Example 1: Extract Values from JSON Using the :: Operators

Example 2: Extract Values from JSON using JSON_EXTRACT_

Example 3: Find Existence of Values and Paths using JSON_MATCH_ANY

Example 4: Aggregate Elements in a JSON Array Value using REDUCE

Example 5: Use the :: Operators in UPDATE Statements

Example 6: Convert an Array (List) of JSON Objects to a Table

Example 7: Aggregate Elements in JSON Arrays Across Rows using TABLE and JSON_TO_ARRAY

Managing Collections of Metadata

DDL: Defining Tables with JSON Columns

Comparing JSON and LONGTEXT Columns

Defining JSON Columns

JSON Collation

Indexing Data in JSON Columns

DML: Accessing Data in JSON Columns

Inserting Data into a JSON Column

Columnstore Tables Having JSON Columns with Null Values or Empty Arrays

Behavior When preserve_original_colstore_json is Set to OFF

Accessing Fields in a JSON Object

Converting a JSON String Value into a SQL String

Converting a JSON Number or Boolean Value into a SQL DOUBLE

Using the ::$ and ::% Operators

Accessing Nested JSON Objects

Working with Nested Arrays in a JSON Column

Nested JSON Ingest

Using Colon-Colon Notation in UPDATE Queries

Field Name Syntax for JSON Access and UPDATE Queries

JSON Performance

JSON Array Performance

Limitation

Implementation Considerations

Infinities and NaNs

Unicode Support

Character Encodings and Collation

Table-level Override

Column-level Override

Expression-level Override

Maximum JSON Value Size

Performance Impact on Parquet Encoding for JSON

Related Topics

Was this article helpful?

On this page

Was this article helpful?

Example 1: Extract Values from JSON Using the `::` Operators

Example 2: Extract Values from JSON using `JSON_EXTRACT_`

Example 5: Use the `::` Operators in UPDATE Statements

Behavior When `preserve_original_colstore_json` is Set to `OFF`