Important

The SingleStore 9.0 release candidate (RC) gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 8.9 is recommended for production workloads, which can later be upgraded to SingleStore 9.0.

Multi-Value Hash Index (JSON)

Syntax

SQL

-- Adding the multi-value index to a new table --
CREATE TABLE <tablename>
  (<col> JSON NULL,
   MULTI VALUE INDEX(col) INDEX_OPTIONS='<options>';
-- Adding the multi-value index an to existing table column --
ALTER TABLE <tableName>
  ADD MULTI VALUE INDEX (col) INDEX_OPTIONS='<options>';

where <options> is a JSON object to specify the tokenizer and other metadata related to it.

'{"TOKENIZER":"MATCH_ANY", "PATH":[<path_array>]}'

where <path_array> is a comma-separated list of values that specify the path to the field that needs to be indexed. If an empty path array, such as, "PATH":[], is specified, then the top-level elements in the array are indexed (such as [1,2,3]).

Using the Multi-Value Hash Index

The JSON_MATCH_ANY() built-in function is used to compare values inside a JSON array and to extract values from a JSON object. You can match values that are valid JSON types (object, array, integer, float, string, boolean, or JSON null) within a JSON document.

Note

Currently, multi-value hash Index for JSON supports matching only via the MATCH_PARAM_JSON() expression.

You can also match scalar values (such as strings or doubles) inside the JSON object. However, ensure that string values are valid JSON strings wrapped in quotes. Use either of the following:

Using the TO_JSON('<str>') built-in function.
Manually specifying a quoted constant string literal, e.g., ... = '"address"'.

Note: Typecasting to JSON using :>JSON does not automatically wrap an unquoted string into a valid JSON string. (However, it works for numeric types, for example, 2.5:>JSON.)

Supported Query Shapes

JSON_MATCH_ANY(MATCH_PARAM_JSON() = <value>, <key_path>)
JSON_MATCH_ANY(MATCH_PARAM_JSON() <=> <value>, <key_path>)
JSON_MATCH_ANY(MATCH_PARAM_JSON() IN (<value1>, <value2>..), '<key_path>')

<value> can be a constant literal (such as a string, integer, float), a typecast operator (:>, !:>), a user-defined variable (UDV), a deterministic built-in expression (most built-in functions, except for non-deterministic ones like UUID() or RAND()), or a combination of these.

In general, the lookup value must remain constant during query execution. If the value changes per row, such as a column reference, then the index cannot be effectively used and the query falls back to a regular table scan.

The index is only used for queries that match the exact <key_path> defined during index creation. For example, if the index is created on addresses::city, the query benefits from the index only when filtering values specifically under this path. It is not applied to deeper paths like addresses::city::district or to sibling keys like addresses::zip. To apply the index to any other path, you must create a separate index.

To check whether the index was used for your query filter, run an EXPLAIN query - the phrase multi-value hash index in the predicate under ColumnStoreFilter indicates that the index was used..

SQL

EXPLAIN SELECT col FROM t
WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('type'), col, 'key1');

+--------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                      |
|+-------------------------------------------------------------------------------------------------------------+
| Project [t.col]                                                                                              |
| ColumnStoreFilter [JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('type'),t.col,'key1') multi-value hash index] |
| ColumnStoreScan db.t, SORT KEY __UNORDERED () table_type:sharded_columnstore                                 |
|+-------------------------------------------------------------------------------------------------------------+

Remarks

Multi-value hash index is case-sensitive for JSON types.
Multi-value hash index for JSON activates when matched using the following operators: =, <=>, and IN().

The following table lists the expressions that support Multi-Value Hash Index for each of these operators:

Operator	Supported Expression
`=`, `<=>`	Constant literals, for example strings, `NULL`, `DOUBLE`, `INT`, etc. Typecast operators `:>` and `!:>`, for example, `100001:>JSON`. User-defined variables (UDVs), for example: SQL SET @obj = '{"city": "New York"}':>JSON; SELECT a:>JSON FROM t WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON()=@obj, a, 'city'); Procedural SQL variables (stored procedure arguments and local variables). Nested constant (deterministic) built-in expressions, for example: `JSON_BUILD_ARRAY(JSON_SET_JSON('{}', 'a', @json), '{"a":4}':>JSON)` A combination of all of the above.
`IN()`	Parameterized `IN` lists with: Literals of the same types Single argument built-in expressions of the same shape and literal types in the list, for example, `IN(HEX('a'), HEX('b'))`.

Operator

Supported Expression

=, <=>

Constant literals, for example strings, NULL, DOUBLE, INT, etc.
Typecast operators :> and !:>, for example, 100001:>JSON.

User-defined variables (UDVs), for example:

SQL

SET @obj = '{"city": "New York"}':>JSON;
SELECT a:>JSON FROM t
WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON()=@obj, a, 'city');

Procedural SQL variables (stored procedure arguments and local variables).
Nested constant (deterministic) built-in expressions, for example:
```
JSON_BUILD_ARRAY(JSON_SET_JSON('{}', 'a', @json), '{"a":4}':>JSON)
```
A combination of all of the above.

IN()

Parameterized IN lists with:
- Literals of the same types
- Single argument built-in expressions of the same shape and literal types in the list, for example, IN(HEX('a'), HEX('b')).

Limitations

Only runtime constant expressions are supported for matching (i.e. UDV, builtins, typecasts, PSQL variables are supported; UDFs and column fields are not supported).
Using OR inside a JSON_MATCH_ANY predicate (for example, JSON_MATCH_ANY(MATCH_PARAM_JSON() = 12 OR MATCH_PARAM_JSON() = 1, a, 'a')) disables index usage. Instead, use either an IN list or multiple JSON_MATCH_ANY filters.
Only parameterized IN list is supported.
Indexing multiple key paths for the same JSON field requires separate indexes, for example, to index both a::k1 and a::k2, you must create two indexes.
This index is supported only for columnstore tables.
Multi-value index for JSON does not support the following:
- References to other unindexed fields on the right-hand side of an expression.
- Non-deterministic built-in functions, for example RAND().
- JSON_MATCH_ANY predicate with MATCH_ELEMENTS option.
- != and other comparison operators (excluding = and <=>).
- NOT IN lists.

Examples

Example 1- Searching `users` by `city`

Consider the following table definition:

CREATE TABLE users (
  id BIGINT PRIMARY KEY AUTO_INCREMENT,
  created DATETIME(6) DEFAULT NOW(6),
  data JSON
);

User information is stored in the data column as JSON, structured as follows:

** row-1
  {
    "id": 1,
    "name": "Alice",
    "addresses": [
      { "street": "123 Main St", "city": "New York", "zip": "10001" },
      { "street": "456 Oak Ave", "city": "Los Angeles", "zip": "90001" }
     ]
  }
** row-2
  {
    "id": 2,
    "name": "Bob",
    "addresses": [
     { "street": "789 Pine Ln", "city": "Chicago", "zip": "60601" }
    ]
}
** row-3
  {
    "id": 3,
    "name": "Charlie",
    "addresses": [
     { "street": "101 Elm St", "city": "New York", "zip": "10002" },
     { "street": "222 Maple Dr", "city": "Houston", "zip": "77001" }
   ]
}

Adding a multi-value Hash Index to the path "addresses, city" can significantly improve lookup performance.

ALTER TABLE users ADD MULTI VALUE INDEX (data)
   INDEX_OPTIONS='{"TOKENIZER":"MATCH ANY","PATH":["addresses", "city"]}';

Find all users who live in Houston using the JSON_MATCH_ANY built-in function.

SQL

SELECT id, data::name FROM users
 WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON()='"Houston"', data, 'addresses', 'city');

+----+------------+
| id | data::name |
+----+------------+
|  3 | "Charlie"  |
+----+------------+

To verify that the multi-value hash index was used, run the EXPLAIN query as:

SQL

EXPLAIN SELECT id, data::name FROM users
   WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON()='"Houston"', data, 'addresses', 'city');

+-----------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                     |
+-----------------------------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0 parallelism_level:segment                                                              |
| Project [users.id, JSON_EXTRACT_JSON(users.data,'name') AS `data::name`]                                                    |
| ColumnStoreFilter [JSON_MATCH_ANY(MATCH_PARAM_JSON() = '\"Houston\"',users.data,'addresses','city') multi-value hash index] |
| ColumnStoreScan db1.users, SORT KEY __UNORDERED () table_type:sharded_columnstore                                           |
+-----------------------------------------------------------------------------------------------------------------------------+

Example 2 - Multiple indexes

In this example, two separate multi-value hash indexes are used to optimize common lookup patterns on a JSON column for the following two fields:

An integer field eventId
A string array tags

These indexes help speed up queries that filter by eventID and check for specific tags without scanning the entire table.

Create a table.

SQL

CREATE TABLE events (
  id BIGINT PRIMARY KEY,
  created DATETIME(6),
  payload JSON
);

Add multi-value hash indexes via ALTER TABLE.

SQL

ALTER TABLE events
  ADD MULTI VALUE INDEX idx_event_id (payload)
  INDEX_OPTIONS = '{"TOKENIZER": "MATCH ANY", "PATH": ["eventId"]}';
ALTER TABLE events
  ADD MULTI VALUE INDEX idx_tags (payload)
  INDEX_OPTIONS = '{"TOKENIZER": "MATCH ANY", "PATH": ["tags"]}';

Insert data into the table.

SQL

INSERT INTO events VALUES
    (1, NOW(6), JSON_BUILD_OBJECT('eventId', 101, 'tags', JSON_BUILD_ARRAY('sports', 'outdoor', 'fun'))),
    (2, NOW(6), JSON_BUILD_OBJECT('eventId', 202, 'tags', JSON_BUILD_ARRAY('music', 'concert'))),
    (3, NOW(6), JSON_BUILD_OBJECT('eventId', 101, 'tags', JSON_BUILD_ARRAY('sports', 'indoor')));

Query using both the indexed fields:

SQL

SELECT *
 FROM events
 WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = 101:>JSON,
   payload, 'eventId')
   AND JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('sports'),
   payload, 'tags');

+----+----------------------------+---------------------------------------------------+
|id  | created                    | payload                                           |
+----+----------------------------+---------------------------------------------------+
|  1 | 2025-04-29 15:36:52.797432 | {"eventId":101,"tags":["sports","outdoor","fun"]} |
|  3 | 2025-04-29 15:36:52.797432 | {"eventId":101,"tags":["sports","indoor"]}        |
+----+----------------------------+---------------------------------------------------+

Verify index usage from the query EXPLAIN:

SQL

EXPLAIN SELECT *
 FROM events
 WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = 101:>JSON,
   payload, 'eventId')
   AND JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('sports'),
   payload, 'tags');

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0 parallelism_level:segment                                                                                                                                                                                        |
| Project [events.id, events.created, events.payload]                                                                                                                                                                                                   |
| ColumnStoreFilter [JSON_MATCH_ANY(MATCH_PARAM_JSON() = (101:>JSON COLLATE utf8mb4_bin NULL),events.payload,'eventId') multi-value hash index AND JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('sports'),events.payload,'tags') multi-value hash index] |
| ColumnStoreScan db1.events, SORT KEY __UNORDERED () table_type:sharded_columnstore                                                                                                                                                                    |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Multi-Value Hash Index (JSON)

On this page

Syntax

Using the Multi-Value Hash Index

Remarks

Limitations

Examples

Example 1- Searching `users` by `city`

Example 2 - Multiple indexes

Was this article helpful?

On this page

Was this article helpful?

Multi-Value Hash Index (JSON)

On this page

Syntax

Using the Multi-Value Hash Index

Remarks

Limitations

Examples

Example 1- Searching users by city

Example 2 - Multiple indexes

Was this article helpful?

On this page

Was this article helpful?

Example 1- Searching `users` by `city`