Using JSON

About JSON in SingleStore

SingleStore exposes a Javascript Object Notation (JSON) column type that implements the JSON standard.

You can define columns in SingleStore tables using the JSON Type. Analytics on these JSON columns is very efficient as SingleStore automatically columnarizes JSON data. A schema is inferred from JSON keys and the data is split into columns by key path and stored in an encoded Parquet-like format.

The JSON data is stored as if you had created a schema with separate columns for every field. As a result, queries on JSON columns read only the parts of a JSON object that are relevant to a query and therefore have excellent performance.

SingleStore provides a set of JSON functions for extracting, searching, analyzing, and modifying JSON data, including:

JSON_EXTRACT_<type>, to extract values out of JSON documents at specified keypaths.
Shorthand syntax, :: operators, to extract values out of JSON documents. The :: operators are convenient aliases for the JSON_EXTRACT_<type> functions and follow the same rules.
- JSON shorthand syntax is a path that uses :: as a separator.
JSON_MATCH_ANY, to check for the existence of values in a JSON document or array based on a path and a filter.
JSON_TO_ARRAY, to convert a JSON array to a SingleStore array.
- JSON_TO_ARRAY can be used in combination with TABLE to operate on values of a JSON array as SQL rows. This functionality is similar to UNNEST in other database systems.

JSON columns can be searched using SingleStore's full-text search. In addition, JSON columns can be indexed using computed columns; refer to Indexing Data in JSON Columns for more information.

SingleStore has a native BSON data type, plus SingleStore Kai, a MongoDB®-compatible API. The BSON data is stored in columns just like JSON data, and can be indexed with computed columns.

An alternative to using the JSON type is to map JSON fields to individual columns and use SQL queries to access the JSON data. Refer to Load JSON Files with LOAD DATA for information about how to map JSON fields to SingleStore columns during the loading process.

Examples

The following table is used in the examples. Note the printings array is intended to indicate the number of copies of the book printed in each printing; printings data is in millions of books and is not accurate.

SQL

CREATE TABLE books_json (id INT, books JSON);

INSERT INTO books_json VALUES
(1, '{
       "title": "Onyx Storm",
       "author": "Rebecca Yarros",
       "details": {
         "publisher": "Entangled:Red Tower Books",
         "numpages": 544,
         "publication date": "January 21, 2025",
         "printings": [2,1.3],
         "series": "The Empyrean"
       }
     }'
),
(2, '{
       "title": "The Maid",
       "author": "Nita Prose",
       "details": {
         "publisher": "Ballantine Books",
         "numpages": 385,
         "publication date": "January 4, 2022",
         "printings": [0.5,0.75,1.2]
       }
     }'
),
(3, '{
       "title": "The Last Letter",
       "author": "Rebecca Yarros",
       "details": {
         "publisher": "Entangled:Amara",
         "numpages": 432,
         "publication date": "February 26, 2019",
         "printings": [0.25,0.5,0.5]
       }
     }'
);

Example 1: Extract Values from JSON Using the `::` Operators

The ::, ::$, and ::% operators can be used to extract fields, strings, and SQL doubles from JSON documents. In the example below:

books::title extracts the title field,
books::$title extracts the title field as a SQL string,
books::details::%numpages extracts the numpages field as a double, and
books::details::printings::%`0` extracts the 0th element of the printings array.

Backticks (`) are required around numeric keys, as shown with `0` above

Refer to Using the ::$ and ::% Operators for details.

SQL

SELECT id,
       books::title AS title,
       books::$title AS title_string,
       books::details::%numpages AS numpages,
       books::details::printings::%`0` AS first_printing
FROM books_json
ORDER BY id;

+------+-------------------+-----------------+----------+----------------+
| id   | title             | title_string    | numpages | first_printing |
+------+-------------------+-----------------+----------+----------------+
|    1 | "Onyx Storm"      | Onyx Storm      |      544 |              2 |
|    2 | "The Maid"        | The Maid        |      385 |            0.5 |
|    3 | "The Last Letter" | The Last Letter |      432 |           0.25 |
+------+-------------------+-----------------+----------+----------------+

Example 2: Extract Values from JSON using `JSON_EXTRACT_`

The JSON_EXTRACT_ functions can be used to extract values from a JSON document in addition to the path expression syntax shown in Example 1. The JSON_EXTRACT_ functions can be used when you want to use variables or expressions in the keypath. The keypath in JSON_EXTRACT_ functions is a comma-separated list of object keys or zero-indexed array positions.

Below is a query similar to the query in Example 1, expressed using JSON_EXTRACT_ functions.

SQL

SELECT id,
      JSON_EXTRACT_JSON(books,'title') AS title,
      JSON_EXTRACT_STRING(books,'title') AS title_string,
      JSON_EXTRACT_BIGINT(books,'details','numpages') AS numpages,
      JSON_EXTRACT_DOUBLE(books,'details','printings',1-1) AS first_printing
FROM books_json
ORDER BY id;

+------+-------------------+-----------------+----------+----------------+
| id   | title             | title_string    | numpages | first_printing |
+------+-------------------+-----------------+----------+----------------+
|    1 | "Onyx Storm"      | Onyx Storm      |      544 |              2 |
|    2 | "The Maid"        | The Maid        |      385 |            0.5 |
|    3 | "The Last Letter" | The Last Letter |      432 |           0.25 |
+------+-------------------+-----------------+----------+----------------+

Note that the expression 1-1 is used to extract the value in position 0 in the printings array. Such expressions are supported in JSON_EXTRACT_ functions, but not when using the :: operators.

A JSON_EXTRACT_ on a nested key will only scan the column for that key, not the entire JSON document. For example, JSON_EXTRACT_JSON(jsondata,'details','title') AS title, will only scan the column for the title key.

Example 3: Find Existence of Values and Paths using JSON_MATCH_ANY

The JSON_MATCH_ANY and JSON_MATCH_ANY_EXISTS functions can be used to find the existence of values matching predicates and paths in JSON documents.

The JSON_MATCH_ANY function returns true if a value exists in the JSON at the filter path for which the filter predicate evaluates to true. The JSON_MATCH_ANY_EXISTS function returns true if there is a value (possibly null) in the JSON at the filter path.

The following is an example of using JSON_MATCH_ANY to find books that are part of The Empyrean series.

SQL

SELECT id, books::$title AS title, books::details::$series AS series
FROM books_json
WHERE JSON_MATCH_ANY(books::?details.series, MATCH_PARAM_STRING_STRICT() = "The Empyrean");

+------+------------+--------------+
| id   | title      | series       |
+------+------------+--------------+
|    1 | Onyx Storm | The Empyrean |
+------+------------+--------------+

Example 4: Aggregate Elements in a JSON Array Value using REDUCE

The REDUCE function can be used to aggregate elements in a JSON array value. SingleStore recommends using the REDUCE function when aggregating elements within a JSON array in a row.

Refer to Example 5 for an example of aggregating JSON array values across multiple rows which takes advantage of JSON Array Performance Enhancements.

The REDUCE function applies an expression to each element of an array and returns a single value. In the following example, REDUCE is used to sum the elements of the printings array for each row. The result is the total number of copies of each book that have been printed.

SQL

SELECT books::$title,
  REDUCE(
     0 :> double,
     JSON_TO_ARRAY(books_json.books::details::printings),
     REDUCE_ACC() + REDUCE_VALUE()
   ) AS total_printed_by_book
FROM books_json;

+-----------------+-----------------------+
| books::$title   | total_printed_by_book |
+-----------------+-----------------------+
| The Maid        |                  2.45 |
| The Last Letter |                  1.25 |
| Onyx Storm      |                   3.3 |
+-----------------+-----------------------+

In this example:

The JSON_TO_ARRAY function converts the array in the JSON field jsondata::books::details::printings to a SQL array.
The REDUCE function takes as input
1. An initial value: 0:>double.
2. A SQL array: the result of JSON_TO_ARRAY.
3. An accumulator expression: (REDUCE_ACC() + REDUCE_VALUE()), which specifies that the array elements should be summed.

Example 5: Use the `::` Operators in UPDATE Statements

You can use the :: operators for JSON keypaths in UPDATE queries. The :: operators are simply convenient aliases for the JSON_SET_<type> functions and follow the same rules.

The following query updates the number of pages in the book, Onyx Storm.

SQL

UPDATE books_json
SET books::details::%numpages = 545
WHERE books::$title = "Onyx Storm";

To add a printing for the book Onyx Storm, use JSON_ARRAY_PUSH_<type> as follows.

SQL

UPDATE books_json
SET books::details::printings = JSON_ARRAY_PUSH_DOUBLE(books::details::printings, 1.5)
WHERE books::$title = "Onyx Storm";

To add a series to the book The Maid.

SQL

UPDATE books_json
SET books::details::$series = 'Molly the Maid'
WHERE books::$title = "The Maid";

View the resulting updates.

SQL

SELECT JSON_PRETTY(books)
FROM books_json
WHERE books::$title = "The Maid";

+---------------------------------------------+
| {
  "author": "Nita Prose",
  "details": {
    "numpages": 385,
    "printings": [
      0.5,
      0.75,
      1.2
    ],
    "publication date": "January 4, 2022",
    "publisher": "Ballantine Books",
    "series": "Molly the Maid"
  },
  "title": "The Maid"
}                       |
+---------------------------------------------+

Example 6: Convert an Array (List) of JSON Objects to a Table

Use the JSON_TO_ARRAY function and the TABLE built-in function to convert a list of JSON objects to a table.

Create a table with a column to hold a JSON value and insert data into that table.

SQL

CREATE TABLE json_list_example (json_list JSON);

INSERT INTO json_list_example values(
'[
      {
       "title": "Onyx Storm",
       "author": "Rebecca Yarros",
 	 "numpages": 544
},
     {
      "title": "The Last Letter",
      "author": "Rebecca Yarros",
      "numpages": 432
     }
]');

In the following query, the syntax JOIN TABLE(JSON_TO_ARRAY(json_list)) converts the objects in the json_list column to a table, that is, each object in the JSON list is turned into a row in the table.

The JSON_TO_ARRAY function converts the JSON array to a SingleStore ARRAY.
The TABLE function converts a SingleStore ARRAY to a column named table_col that contains one row for each array entry.
The JOIN clause is required when using the TABLE function on an existing SingleStore table.

SQL

SELECT json_list_as_table.table_col AS books_col
FROM json_list_example JOIN TABLE(JSON_TO_ARRAY(json_list)) AS json_list_as_table;

+----------------------------------------------------------------------+
| books_col                                                            |
+----------------------------------------------------------------------+
| {"author":"Rebecca Yarros","numpages":544,"title":"Onyx Storm"}      |
| {"author":"Rebecca Yarros","numpages":432,"title":"The Last Letter"} |
+----------------------------------------------------------------------+

The JSON_AGG function can be used to combine the rows in the table into a single row and re-create the original JSON array.

SQL

WITH books_table AS (
	SELECT json_list_as_table.table_col AS books_col
            FROM json_list_example JOIN TABLE(JSON_TO_ARRAY(json_list)) AS json_list_as_table
)
SELECT JSON_AGG(books_col) FROM books_table;

+----------------------------------------------------------------------------------------------------------------------------------------+
| JSON_AGG(books_col)                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------+
| [{"author":"Rebecca Yarros","numpages":544,"title":"Onyx Storm"},{"author":"Rebecca Yarros","numpages":432,"title":"The Last Letter"}] |
+----------------------------------------------------------------------------------------------------------------------------------------+

Managing Collections of Metadata

JSON is useful for managing a collection of diverse data, represented as name-value pairs, that might otherwise be cumbersome to refactor into a formalized key-value table, or that might be stored in a table that is sparsely populated. For example, suppose an organization had an asset management application using SingleStore to track all the information about its physical assets. The asset data is diverse—what’s relevant for a desk differs from what’s important for a server machine or a company car. All assets might have common attributes, such as asset tag ID, asset type, asset name, and description. Each type of asset might have unique attributes, such as size and weight dimensions, hostname and IP address, or gas mileage.

Instead of creating a highly granular table to manage all data as key-value pairs, this organization could simply create a SingleStore table using a JSON column to efficiently manage the unique attribute data. With this design:

Each asset gets a row in the table.
Attributes that are common to all assets have their own column in the table. These columns allow you to query on common features and quickly narrow down the final result set as much as possible (for example, filtering by asset type).
The various, remaining attributes associated with each asset are stored in a JSON column (which might be named something like property_bag). For example, the JSON column for an office desk asset could include JSON data such as size, weight, and number of drawers. The JSON data for a server machine could include rack location, number of cores, and MAC address.

DDL: Defining Tables with JSON Columns

Any SingleStore table can contain one or more columns of data type JSON. A JSON column can optionally be suffixed with NOT NULL.

Comparing JSON and LONGTEXT Columns

A JSON column is analogous to a LONGTEXT column in the following ways:

JSON columns can store arbitrarily large JSON values in a normalized text representation.
JSON columns have the same storage requirement, as if the JSON value were stored in a text column.

The primary difference is that JSON data is stored in a normalized format, which makes many operations faster than if the data were stored manually in a text column. The following is an example of non-normalized data, which is valid JSON but is relatively difficult to parse:

JSON

'{ "b\u000a": 1,"a": 2 ,"a":3 } '

Normalized data, on the other hand, is easier to parse because duplicate keys are merged, the data is sorted by keys, and extraneous whitespace is removed.

JSON

'{"a":3,"b\n":1}'

SingleStore recommends storing JSON data in JSON columns and not in LONGTEXT columns. JSON columns validate the JSON values and provide Unicode Support. If storage space and memory use is a concern, and search and extraction are not required on the column, use a string encoding on the column. Refer to Columnstore Seekability using JSON for more information.

Defining JSON Columns

Defining a JSON column in a SingleStore table is as simple as specifying the JSON data type in the CREATE TABLE command:

SQL

CREATE TABLE assets (
   tag_id BIGINT PRIMARY KEY,
   name TEXT NOT NULL,
   description TEXT,
   properties JSON NOT NULL);

Indexing Data in JSON Columns

JSON columns are not indexed directly - they are indexed using computed columns. For the fastest performance, you should not use JSON built-ins or :: notation in your filters. Instead, create a computed column that includes the JSON column in the computation, and then use the computed column for the index. In this way, the index gets updated only when the relevant JSON data is updated in a row.

SQL

CREATE TABLE assets (
   tag_id BIGINT PRIMARY KEY,
   name TEXT NOT NULL,
   description TEXT,
   properties JSON NOT NULL,
   weight AS properties::%weight PERSISTED DOUBLE,
   license_plate AS properties::$license_plate PERSISTED LONGTEXT,
   KEY(license_plate), KEY(weight));

JSON computed columns that are indexed will be utilized by the optimizer more efficiently. Queries that use indexed computed columns as filters or sorts, will perform faster by avoiding expression evaluation, and by seeking into or searching the indexes rather than scanning tables. In the following examples, we will use the :: notation in the filtering and sorting clauses to illustrate how the optimizer matches the computed columns.

SQL

EXPLAIN SELECT * FROM assets WHERE properties::$license_plate = "VGB116";

+------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                          |
+------------------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0                                                                             |
| Project [assets.tag_id, assets.name, assets.description, assets.properties, assets.weight, assets.license_plate] |
| ColumnStoreFilter [assets.license_plate = 'VGB116' index]                                                       |
| ColumnStoreScan test1.assets, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore      |
+------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

SQL

EXPLAIN SELECT * FROM assets ORDER BY properties::%weight;

+------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                          |
+------------------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.weight] partitions:all alias:remote_0                                                      |
| Project [assets.tag_id, assets.name, assets.description, assets.properties, assets.weight, assets.license_plate] |
| Sort [assets.weight]                                                                                             |
| ColumnStoreScan test1.assets, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore      |
+------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

DML: Accessing Data in JSON Columns

This section describes how to insert and update data in a SingleStore table with one or more JSON columns.

Inserting Data into a JSON Column

When inserting a row in a table, specifying JSON data is straightforward. For example: Given a table t defined as CREATE TABLE test_table(col_a TEXT, col_b JSON);, you can insert a row into test_table as follows:

SQL

INSERT INTO test_table(col_a,col_b) VALUES ('hello','{"x":"goodbye","y":"goodnight"}');

Columnstore Tables Having JSON Columns with Null Values or Empty Arrays

By default, SingleStore preserves columnstore JSON NULL values and empty arrays. To change this behavior and disable this setting, set the preserve_original_colstore_json global variable to OFF. This flag preserves the original columnstore JSON for any new data that is loaded; it does not update the existing data. By default, this variable is set to AUTO (same as ON).

Behavior When `preserve_original_colstore_json` is Set to `OFF`

When you store a JSON column in a columnstore table (and preserve_original_colstore_json is set to OFF), then null values and empty arrays in the JSON object are handled as follows when the object is written to the columnstore:

Name/value pairs with the value NULL are normally removed from the JSON object.
Name/value pairs containing empty arrays are normally removed from the JSON object.
If the JSON object has only the value NULL or [], the value is replaced with NULL.

Example commands you can use to store data are INSERT, UPDATE, and LOAD DATA.

An example INSERT scenario follows. Consider a table that is defined as:

SQL

CREATE TABLE json_empty_values_table(a INT, b JSON, SORT KEY (a));

Insert five rows into the table:

SQL

INSERT INTO json_empty_values_table VALUES (1, '{"v":null}');
INSERT INTO json_empty_values_table VALUES (2, '{"w":[]}');
INSERT INTO json_empty_values_table VALUES (3, '{"x":"foo","y":null,"z":[]}');
INSERT INTO json_empty_values_table VALUES (4, 'null');
INSERT INTO json_empty_values_table VALUES (5, '[]');

Manually flush the inserted data to the columnstore:

SQL

OPTIMIZE TABLE json_empty_values_table FLUSH;

Query the table:

SQL

SELECT * FROM json_empty_values_table ORDER BY a;

+------+-----------------------------+
| a    | b                           |
+------+-----------------------------+
|    1 | {"v":null}                  |
|    2 | {"w":[]}                    |
|    3 | {"x":"foo","y":null,"z":[]} |
|    4 | null                        |
|    5 | []                          |
+------+-----------------------------+

Accessing Fields in a JSON Object

To access a field of a JSON object stored in a column, use the name of the column suffixed with ::keyName. For example, if column data contains {"bits":[true,false]}, then the expression data::bits evaluates to the JSON value [true,false].

Converting a JSON String Value into a SQL String

The :: operator, when applied to a string column of a JSON object, returns the value of the string enclosed in quotes. Use the ::$keyname operator to return the value of the string without the enclosing quotes.

For example, consider a table TestJSON that contains a JSON column data.

SQL

INSERT INTO TestJSON VALUES ('{"first":"hello"}');

Retrieve the value of the data column using :: and ::$ operators:

SQL

SELECT data::first, data::$first FROM TestJSON;

+-------------+--------------+
| data::first | data::$first |
+-------------+--------------+
| "hello"     | hello        |
+-------------+--------------+

In this example, the data JSON column contains two string values.

SQL

INSERT INTO TestJSON VALUES ('{"first":"hello", "second":"world"}');

To retrieve the concatenated SQL string, use the ::$ operator:

SQL

SELECT CONCAT(data::$first, ' ', data::$second) FROM TestJSON;

+------------------------------------------+
| CONCAT(data::$first, ' ', data::$second) |
+------------------------------------------+
| hello world                              |
+------------------------------------------+

Converting a JSON Number or Boolean Value into a SQL DOUBLE

To transparently convert a JSON number or Boolean value into a SQL DOUBLE, use the name of the column suffixed with ::%keyname. For example, if column data contains {"valid":true,"value":3.14}, then data::%valid is 1 and data::%value is 3.14.

Using the ::$ and ::% Operators

If either the ::$ or ::% access operator is used on a JSON field that is not of the appropriate type, the JSON result is quietly converted to a string or a double based on the rules defined in JSON_EXTRACT_<type>.

The :: access operators are simply convenient aliases for the JSON_EXTRACT_<type> built-in functions, and they follow all the same rules. However, the :: operators do not work on the outputs of User Defined Functions (UDFs), such as SELECT udf_name(1)::key, and expressions. The JSON_EXTRACT_<type> functions must be used instead.

In addition, The :: operators require that numeric keys be specified with backticks (`).

The following SELECT statement returns the 2nd (zero-indexed) array element of the array {"a":[1,2,3,4]}. A syntax error will be returned if the backticks (`) are not included.

SQL

SELECT json, json::a::`2`
FROM (SELECT '{"a":[1,2,3,4]}' AS json) sub;

+-----------------+--------------+
| json            | json::a::`2` |
+-----------------+--------------+
| {"a":[1,2,3,4]} | 3            |
+-----------------+--------------+

In addition, backticks can be used for non-numeric key names and array indexes, which is useful if the key name or index contains a space.

The following query is valid and will return the same output as the query above.

SQL

SELECT json, json::`a`::`2`
FROM (SELECT '{"a":[1,2,3,4]}' AS json) sub;

+-----------------+----------------+
| json            | json::`a`::`2` |
+-----------------+----------------+
| {"a":[1,2,3,4]} | 3              |
+-----------------+----------------+

The following is the same logical query except the array index 2 has been replaced by the expression 1+1 and the JSON_EXTRACT_JSON function is used.

While this example uses a simple expression (1+1), more functions such as DAYOFWEEK and more complex expressions can be used.

Refer to JSON_EXTRACT_<type> for details.

SQL

SELECT json, JSON_EXTRACT_JSON(json, "a", 1+1)
FROM (SELECT '{"a":[1,2,3,4]}' AS json) sub;

+-----------------+--------------+
| json            | json::a::`2` |
+-----------------+--------------+
| {"a":[1,2,3,4]} | 3            |
+-----------------+--------------+

Accessing Nested JSON Objects

To access nested JSON objects, chain the colon-colon operator to form a keypath. For example, data::address::street means the street field of the address field of the data column.

Note

If one of the keys in the keypath is not present in the nested object, then the entire colon-colon expression yields SQL NULL.

Working with Nested Arrays in a JSON Column

Consider a table defined as:

SQL

CREATE TABLE json_tab (`id` INT(11) DEFAULT NULL,`jsondata` JSON COLLATE utf8_bin);

Insert values as follows:

SQL

INSERT INTO json_tab VALUES
( 8765 ,' {"city":"SFO","sports_teams":[{"sport_name":"football","teams":  [{"club_name":"Raiders"},{"club_name":"49ers"}]},
{"sport_name":"baseball","teams" : [{"club_name":"As"},{"club_name":"SF Giants"}]}]}') ;

INSERT INTO json_tab VALUES
( 9876,'{"city":"NY","sports_teams" : [{ "sport_name":"football","teams" : [{ "club_name":"Jets"},{"club_name":"Giants"}]},
{"sport_name":"baseball","teams" : [ {"club_name":"Mets"},{"club_name":"Yankees"}]},
{"sport_name":"basketball","teams" : [{"club_name":"Nets"},{"club_name":"Knicks"}]}]}');

Query the table:

SQL

WITH t AS(
SELECT id, jsondata::city city , table_col AS sports_clubs FROM json_tab JOIN TABLE(JSON_TO_ARRAY(jsondata::sports_teams))),

t1 AS(
SELECT t.id, t.city, t.sports_clubs::sport_name sport, table_col AS clubs FROM t JOIN TABLE(JSON_TO_ARRAY(t.sports_clubs::teams)))

SELECT t1.id, t1.city,t1.sport,t1.clubs::club_name club_name FROM t1;

+------+-------+--------------+-------------+
| id   | city  | sport        | club_name   |
+------+-------+--------------+-------------+
| 9876 | "NY"  | "football"   | "Jets"      |
| 9876 | "NY"  | "football"   | "Giants"    |
| 9876 | "NY"  | "baseball"   | "Mets"      |
| 9876 | "NY"  | "baseball"   | "Yankees"   |
| 9876 | "NY"  | "basketball" | "Nets"      |
| 9876 | "NY"  | "basketball" | "Knicks"    |
| 8765 | "SFO" | "football"   | "Raiders"   |
| 8765 | "SFO" | "football"   | "49ers"     |
| 8765 | "SFO" | "baseball"   | "As"        |
| 8765 | "SFO" | "baseball"   | "SF Giants" |
+------+-------+--------------+-------------+

You can also further filter the results by applying conditions. For example, to find the city with the "Yankees" club, run the following query:

SQL

WITH t AS
(SELECT id, jsondata::city city , table_col AS sports_clubs FROM json_tab JOIN TABLE(JSON_TO_ARRAY(jsondata::sports_teams))),

t1 AS
(SELECT t.id, t.city, t.sports_clubs::sport_name sport, table_col AS clubs FROM t JOIN TABLE(JSON_TO_ARRAY(t.sports_clubs::teams)))

SELECT t1.id, t1.city,t1.sport,t1.clubs::club_name club_name FROM t1 WHERE t1.clubs::$club_name = 'Yankees';

+------+------+------------+-----------+
| id   | city | sport      | club_name |
+------+------+------------+-----------+
| 9876 | "NY" | "baseball" | "Yankees" |
+------+------+------------+-----------+

Nested JSON Ingest

Working with a nested JSON ingest requires an additional escape as the value being inserted is interpreted as a SQL string first. Therefore, the backslash (\) symbol needs an escape in addition to the JSON escape (‘\’):

SQL

CREATE TABLE test_json(col_a json);

INSERT INTO test_json VALUES ('{"addParams": "{\\"Emp_Id\\":\\"1487\\",
  \\"Emp_LastName\\":\\"Stephens\\",\\"Emp_FirstName\\":\\"Mark\\",\\"Dept\\":\\"Support\\"}"}');

SQL

SELECT * FROM test_json;

+------------------------------------------------------------------------------------+
| col_a                                                                              |
+------------------------------------------------------------------------------------+
| {"addParams":"{\"Emp_Id\":\"1487\",                                                |
|   \"Emp_LastName\":\"Stephens\",\"Emp_FirstName\":\"Mark\",\"Dept\":\"Support\"}"} |
+------------------------------------------------------------------------------------+

Using Colon-Colon Notation in UPDATE Queries

You can use the colon-colon notation for JSON keypaths in UPDATE queries. For example, the following two UPDATE queries perform the same operation:

SQL

UPDATE users SET userdata::name::$first = 'Alex';
UPDATE users SET userdata = JSON_SET_STRING(userdata, 'name', 'first', 'Alex');

In fact, these access operators are simply convenient aliases for the JSON_SET_<type> built-in function (see JSON_SET_<type>), and they follow all the same rules.

Field Name Syntax for JSON Access and UPDATE Queries

In both the JSON access and update contexts, each field name must either:

follow the syntax for a valid SQL identifier, or
be escaped with backticks in the same manner as a SQL identifier

For example, the following two SELECT queries perform the same operation:

SQL

SELECT ticker_symbol FROM stocks WHERE statistics::%`P/E` > 1.5;
SELECT ticker_symbol FROM stocks WHERE JSON_EXTRACT_DOUBLE(statistics, 'P/E') > 1.5;

When using the JSON_EXTRACT_<type> form of the query (see JSON_EXTRACT_<type> ), there is no constraint on the contents of the keystring. A JSON object can contain any string as a key, even "", or a string with "embedded\u0000nulls". For more information on extracting JSON data, see JSON LOAD DATA.

Implementation Considerations

This section describes some unique behaviors in SingleStore’s implementation of the JSON standard (RFC 4627).

Infinities and NaNs

SingleStore does not attempt to support entities such as the nan (not a number) entity, as in {"value":nan}. Although the JavaScript language supports nan and positive and negative infinities, neither the JSON standard nor SQL’s DOUBLE type provide any way to represent such non-finite values. If your application requires these special values, you might try using null, 0.0, or strings (such as "nan" and "inf") in their place.

Unicode Support

JSON strings can contain any of the 65536 Unicode characters in the Basic Multilingual Plane, either directly as themselves or escaped with JSON’s \uABCD notation. JSON normalization converts all such ASCII escape sequences into UTF-8 byte sequences, except for those characters that must remain \u-encoded to keep the string valid JSON.

Warning

Characters outside of the Basic Multilingual Plans in JSON strings are not supported with escaped notation and may result in incorrect results.

In SingleStore, a backslash (\) literal in a string must be escaped with a backslash. Therefore, pattern strings containing backslashes will have two backslash characters (\\).

Here is an example of how to use escaped notation for characters in the Basic Multilingual Plane:

SQL

SELECT '{"a":"\\u00F9"}' :> JSON;

+---------------------------+
| '{"a":"\\u00F9"}' :> JSON |
+---------------------------+
| {"a":"ù"}                 |
+---------------------------+

Like MySQL, SingleStore supports characters outside the Basic Multilingual Plane (characters whose codepoints are in the range 0x10000 to 0x10FFFF). This includes some uncommon Chinese characters and symbols such as emojis.

You must have the character_set_server engine variable set to utf8mb4 for these characters to work. Refer to List of Engine Variables for more information.

Character Encodings and Collation

Collation provides a set of rules to use when comparing characters in a character set. By default, JSON columns use the utf8_bin collation, which is a case-sensitive collation that sorts by Unicode codepoint value. The collation of a JSON column affects the following:

Output of SELECT DISTINCT, ORDER BY, and other queries that compare entire values.
Identification of duplicate keys inside a single JSON object during normalization. For example, whether the string {"Schlüssel":1,"Schluessel":2} is normalized to {"Schluessel":2}.
Sorting of keys inside a single JSON object. For example, whether the string {"Schlüssel":1,"Schluff":2} is normalized to {"Schluff":2,"Schlüssel":1}.

The default collation generally provides the desired behavior. However, you can override the default at the table or column level.

Note

SingleStore recommends that you use the utf8_bin collation for all JSON columns.

Table-level Override

In the following example, for the users table, both userdata and lastname use the table collation (which has been specified as utf8_unicode_ci).

SQL

CREATE TABLE users (
   uid INT AUTO_INCREMENT PRIMARY KEY,
   userdata JSON,
   lastname AS data::name::$last PERSISTED TEXT) COLLATE=utf8_unicode_ci;

Column-level Override

In the following example, for the orders table:

The data column uses utf8_unicode_ci.
The productdetails column uses utf8_bin.
Both the productname column and the comments column use utf8_general_ci, which is SingleStore’s default database collation

SQL

CREATE TABLE orders (
    oid INT AUTO_INCREMENT PRIMARY KEY,
    data JSON COLLATE utf8_unicode_ci,
    productname AS data::product::$name PERSISTED TEXT,
    productdetails AS data::product::$details PERSISTED TEXT COLLATE utf8_bin,
    comments VARCHAR(256));

Expression-level Override

The :> operator allows you to specify a collation for different expressions in a query.

SQL

:> text collate <colation_name>

The following example uses a binary collation (utf8_bin) for the first expression and a case-insensitive collation (utf8_general_ci) for the second expression.

SQL

SELECT *
FROM sets
WHERE sets.json_field::$x :> text collate utf8_bin = "string1"
AND sets.json_field::$y :> text collate utf8_general_ci = "string2";

Maximum JSON Value Size

Under the hood, JSON data is stored in LONGBLOB columns. While the DDL will allow you to specify a length of up to 4GB, there is an internal limit applied when assigning a value to a string or JSON field that caps the maximum size of a single value to max_allowed_packet. This is 100MB by default and can be set to up to 1GB.

Parquet Encoding for JSON

A Parquet schema has a JSON-like syntax and a nested structure. The schema consists of a group of fields called a message. Each field has three attributes: repetition, type, and name. The type attribute is either a group or a primitive (e.g., int, float, boolean, or string). The repetition attribute can only be one of the following:

Required: exactly one occurrence
Optional: 0 or 1 occurrence
Repeated: 0 or more occurrences

The infer schema process works as follows:

The schema loops through the list of JSON objects.
1. The present keypaths are merged into a schema tree object.
2. If there is a type conflict, the node in the schema tree is marked as un-inferable.
3. If any node in the schema tree contains more children than indicated in the json_document_max_children engine variable, the node is marked as un-inferable.
4. If a node has a greater number of children than indicated in the engine variable json_document_sparse_children_check_threshold and the average occurrence of all children related to the parent is less than 1/ as indicated in the engine variable json_document_sparse_children_check_ratio, the node will be marked as un-inferable.
Once the schema tree object is constructed, analyze the schema tree and prune the tree until the number of key paths (distinct root to leaf paths) is less than the setting for the engine variable json_document_max_leaves. Also, prune any node where the average occurrence of all the children in relation to the number of rows is less than 1/ as indicated in the engine variable json_document_absolute_sparse_key_check_ratio.

The examples will use the engine variables settings and the JSON object table shown below:

json_document_max_children = 4
json_document_sparse_children_threshold = 3
json_document_sparse_children_ratio = 2

JSON Objects
{“a1”: 1, “a2”: {“d1”: 1}, “a3”: {“c1”: 1}, “a4”: {“b1”: 1}}
{“a1”: 2, “a2”: 1, “a4”: {“b2”: 1}}
{“a1”: 3, “a2”: 1, “a4”: {“b3”: 1}}
{“a1”: 4, “a2”: 1, ”a3”: {“c2”: 1}, “a4”: {“b4”: 1}}
{“a1”: 5,, “a3”: {“c3”: 1} “a4”: {“b5”: 1}}

The first pass at merging the keypaths will yield:

The second pass at merging the keypaths contains a type mismatch on a2. The results would be:

The results of the third and fourth passes at merging yield:

In the final merge, the number of children for a4 exceeds the limit (4) set for json_document_max_children so it is marked as un-inferable. The number of children for a3 exceeds the limit (3) set for json_document_sparse_children_threshold. The average occurrences of children of a3 (1) relative to the number of occurrences of a3 (3) is calculated as ⅓ < ½, so a3 would also be un-inferable.

After the merging process, the schema tree with leaves are the inferred columns (also may be referred to as leaves, leaf columns, leaf paths, or key paths).

The keypaths will be a1, a2, a3, a4.

Encoding Nested Columns

Parquet uses the Dremel encoding for nested columns with definition and repetition levels.

Definition levels specify how many optional fields in the path for the column are defined.
Repetition levels specify the repeated value of the field in the path.
SingleStore stores the leaf columns for the JSON schema along with their respective repetition and definition levels.

Each of these internal columns will correspond to the value, definition level and repetition level columns which are encoded in SingleStore encodings (SeekableString, etc).

The example tables will have these values for the internal columns:

a1
Value	Definition Level	Repetition Level
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0

a2
Value	Definition Level	Repetition Level
‘{“d1”: 1}’	1	0
1	1	0
1	1	0
1	1	0
-	0	0

a3
Value	Definition Level	Repetition Level
‘{“c1”: 1}’	1	0
-	0	0
-	0	0
‘{“c2”: 1}’	1	0
‘{“c3”: 1}’	1	0

a4
Value	Definition Level	Repetition Level
‘{“b1”: 1}’	1	0
‘{“b2”: 1}’	1	0
‘{“b3”: 1}’	1	0
‘{“b4”: 1}’	1	0
‘{“b5”: 1}’	1	0

Performance Impact on Parquet Encoding for JSON

When using seekable JSON/JSON Parquet encoding a major impact on performance is if a schema is dense or sparse.

A node in a tree is considered to be dense if it occurs in most JSON rows. A schema is said to be dense if most of the nodes in the entire schema are dense. Otherwise, the schema is considered to be sparse.

Consider the following JSON data table:

JSON Data (dense)
`{"a":1, "b":1}`
`{“a”:2, “c”:{“d”: 1}}`
`{“c”:{“d”:2, “e”:3}}`

The schema will infer the JSON rows as follows:

Using the image above the leaves of the tree become internal columns in the JSON encoding. This is an example of a dense schema.

a	b	c::d	c::e
1	1	NULL	NULL
2	NULL	1	NULL
NULL	NULL	2	3

Using the following JSON data table:

JSON Data (sparse)
`{“a”: 1}`
`{“b”: 1}`
`{“c”: 1}`
`{“d”: 1}`
`{“e”: 1}`

The JSON rows will be encoded as follows:

a	b	c	d	e
1	NULL	NULL	NULL	NULL
NULL	1	NULL	NULL	NULL
NULL	NULL	1	NULL	NULL
NULL	NULL	NULL	1	NULL
NULL	NULL	NULL	NULL	1

The preceding table represents a sparse schema caused by the NULLs not being part of the original JSON strings. This results in poor performance since the NULLs will need to be counted which increases the execution time and memory usage.

To prevent an overly sparse schema from being inferred, SingleStore uses a method where if the average of a key’s children is too low (<1%) in relation to the key itself, the key is stored as a string without inferring its children.

See: JSON_AGG
Training: Working with JSON

On this page

About JSON in SingleStore

Examples

Example 1: Extract Values from JSON Using the :: Operators

Example 2: Extract Values from JSON using JSON_EXTRACT_

Example 3: Find Existence of Values and Paths using JSON_MATCH_ANY

Example 4: Aggregate Elements in a JSON Array Value using REDUCE

Example 5: Use the :: Operators in UPDATE Statements

Example 6: Convert an Array (List) of JSON Objects to a Table

Managing Collections of Metadata

DDL: Defining Tables with JSON Columns

Comparing JSON and LONGTEXT Columns

Defining JSON Columns

Indexing Data in JSON Columns

DML: Accessing Data in JSON Columns

Inserting Data into a JSON Column

Columnstore Tables Having JSON Columns with Null Values or Empty Arrays

Behavior When preserve_original_colstore_json is Set to OFF

Accessing Fields in a JSON Object

Converting a JSON String Value into a SQL String

Converting a JSON Number or Boolean Value into a SQL DOUBLE

Using the ::$ and ::% Operators

Accessing Nested JSON Objects

Working with Nested Arrays in a JSON Column

Nested JSON Ingest

Using Colon-Colon Notation in UPDATE Queries

Field Name Syntax for JSON Access and UPDATE Queries

Implementation Considerations

Infinities and NaNs

Unicode Support

Character Encodings and Collation

Table-level Override

Column-level Override

Expression-level Override

Maximum JSON Value Size

Parquet Encoding for JSON

Encoding Nested Columns

Performance Impact on Parquet Encoding for JSON

Related Topics

Was this article helpful?

On this page

Was this article helpful?

Example 1: Extract Values from JSON Using the `::` Operators

Example 2: Extract Values from JSON using `JSON_EXTRACT_`

Example 5: Use the `::` Operators in UPDATE Statements

Behavior When `preserve_original_colstore_json` is Set to `OFF`