Multi-Value Hash Index (JSON)
On this page
Multi-Value Hash index optimizes lookups within JSON objects and arrays.
For example, if an orders table has a JSON column details with the following data:
{"product_id":[{"id":1001,"type":"flour"},{"id":1002,"type":"yeast"}]}
An index created on the details column at the path ["product_
To index values at the top level in an array, specify an empty path array, such as, "PATH":[]
.[1,2,3]
, you can accelerate searches for values within the array.
The multi-value hash index stores pairs of values along with the list of row IDs where that value occurs.
This index is fundamentally a hash index, leveraging the same high-performance technology as the regular columnar hash indexes.
Syntax
-- Adding the multi-value index to a new table --CREATE TABLE <tablename>(<col> JSON NULL,MULTI VALUE INDEX(col) INDEX_OPTIONS='<options>';-- Adding the multi-value index an to existing table column --ALTER TABLE <tableName>ADD MULTI VALUE INDEX (col) INDEX_OPTIONS='<options>';
where <options>
is a JSON object to specify the tokenizer and other metadata related to it.
'{"TOKENIZER":"MATCH_ANY", "PATH":[<path_array>]}'
where <path_
is a comma-separated list of values that specify the path to the field that needs to be indexed."PATH":[]
, is specified, then the top-level elements in the array are indexed (such as [1,2,3]).
Using the Multi-Value Hash Index
The JSON_
built-in function is used to compare values inside a JSON array and to extract values from a JSON object.null
) within a JSON document.
Note
Currently, multi-value hash Index for JSON supports matching only via the MATCH_
expression.
You can also match scalar values (such as strings or doubles) inside the JSON object.
-
Using the
TO_
built-in function.JSON('<str>') -
Manually specifying a quoted constant string literal, e.
g. , .
.. . = '"address"'
Note: Typecasting to JSON using :>JSON
does not automatically wrap an unquoted string into a valid JSON string.2.
.
Supported Query Shapes
JSON_MATCH_ANY(MATCH_PARAM_JSON() = <value>, <key_path>)
JSON_MATCH_ANY(MATCH_PARAM_JSON() <=> <value>, <key_path>)
JSON_MATCH_ANY(MATCH_PARAM_JSON() IN (<value1>, <value2>..), '<key_path>')
<value>
can be a constant literal (such as a string, integer, float), a typecast operator (:>
, !:>
), a user-defined variable (UDV), a deterministic built-in expression (most built-in functions, except for non-deterministic ones like UUID()
or RAND()
), or a combination of these.
In general, the lookup value must remain constant during query execution.
The index is only used for queries that match the exact <key_
defined during index creation.
To check whether the index was used for your query filter, run an EXPLAIN
query - the phrase multi-value hash index in the predicate under ColumnStoreFilter indicates that the index was used.
EXPLAIN SELECT col FROM tWHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('type'), col, 'key1');
+--------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
|+-------------------------------------------------------------------------------------------------------------+
| Project [t.col] |
| ColumnStoreFilter [JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('type'),t.col,'key1') multi-value hash index] |
| ColumnStoreScan db.t, SORT KEY __UNORDERED () table_type:sharded_columnstore |
|+-------------------------------------------------------------------------------------------------------------+
Remarks
-
Multi-value hash index is case-sensitive for JSON types.
-
Multi-value hash index for JSON activates when matched using the following operators:
=
,<=>
, andIN()
. -
The following table lists the expressions that support Multi-Value Hash Index for each of these operators:
Operator
Supported Expression
=
,<=>
-
Constant literals, for example strings,
NULL
,DOUBLE
,INT
, etc. -
Typecast operators
:>
and!:>
, for example,100001:>JSON
. -
User-defined variables (UDVs), for example:
SET @obj = '{"city": "New York"}':>JSON;SELECT a:>JSON FROM tWHERE JSON_MATCH_ANY(MATCH_PARAM_JSON()=@obj, a, 'city'); -
Procedural SQL variables (stored procedure arguments and local variables).
-
Nested constant (deterministic) built-in expressions, for example:
JSON_BUILD_ARRAY(JSON_SET_JSON('{}', 'a', @json), '{"a":4}':>JSON)
-
A combination of all of the above.
IN()
-
Parameterized
IN
lists with:-
Literals of the same types
-
Single argument built-in expressions of the same shape and literal types in the list, for example,
IN(HEX('a'), HEX('b'))
.
-
-
Limitations
-
Only runtime constant expressions are supported for matching (i.
e. UDV, builtins, typecasts, PSQL variables are supported; UDFs and column fields are not supported). -
Using
OR
inside aJSON_
predicate (for example,MATCH_ ANY JSON_
disables index usage.MATCH_ ANY(MATCH_ PARAM_ JSON() = 12 OR MATCH_ PARAM_ JSON() = 1, a, 'a')) Instead, use either an IN
list or multipleJSON_
filters.MATCH_ ANY -
Only parameterized
IN
list is supported. -
Indexing multiple key paths for the same JSON field requires separate indexes, for example, to index both a::k1 and a::k2, you must create two indexes.
-
This index is supported only for columnstore tables.
-
Multi-value index for JSON does not support the following:
-
References to other unindexed fields on the right-hand side of an expression.
-
Non-deterministic built-in functions, for example
RAND()
. -
JSON_
predicate withMATCH_ ANY MATCH_
option.ELEMENTS -
!=
and other comparison operators (excluding=
and<=>
). -
NOT IN
lists.
-
Examples
Example 1- Searching users
by city
Consider the following table definition:
CREATE TABLE users (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
created DATETIME(6) DEFAULT NOW(6),
data JSON
);
User information is stored in the data column as JSON, structured as follows:
** row-1
{
"id": 1,
"name": "Alice",
"addresses": [
{ "street": "123 Main St", "city": "New York", "zip": "10001" },
{ "street": "456 Oak Ave", "city": "Los Angeles", "zip": "90001" }
]
}
** row-2
{
"id": 2,
"name": "Bob",
"addresses": [
{ "street": "789 Pine Ln", "city": "Chicago", "zip": "60601" }
]
}
** row-3
{
"id": 3,
"name": "Charlie",
"addresses": [
{ "street": "101 Elm St", "city": "New York", "zip": "10002" },
{ "street": "222 Maple Dr", "city": "Houston", "zip": "77001" }
]
}
Adding a multi-value Hash Index to the path "addresses, city" can significantly improve lookup performance.
ALTER TABLE users ADD MULTI VALUE INDEX (data)
INDEX_OPTIONS='{"TOKENIZER":"MATCH ANY","PATH":["addresses", "city"]}';
Find all users
who live in Houston using the JSON_
built-in function.
SELECT id, data::name FROM usersWHERE JSON_MATCH_ANY(MATCH_PARAM_JSON()='"Houston"', data, 'addresses', 'city');
+----+------------+
| id | data::name |
+----+------------+
| 3 | "Charlie" |
+----+------------+
To verify that the multi-value hash index was used, run the EXPLAIN
query as:
EXPLAIN SELECT id, data::name FROM usersWHERE JSON_MATCH_ANY(MATCH_PARAM_JSON()='"Houston"', data, 'addresses', 'city');
+-----------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0 parallelism_level:segment |
| Project [users.id, JSON_EXTRACT_JSON(users.data,'name') AS `data::name`] |
| ColumnStoreFilter [JSON_MATCH_ANY(MATCH_PARAM_JSON() = '\"Houston\"',users.data,'addresses','city') multi-value hash index] |
| ColumnStoreScan db1.users, SORT KEY __UNORDERED () table_type:sharded_columnstore |
+-----------------------------------------------------------------------------------------------------------------------------+
Example 2 - Multiple indexes
In this example, two separate multi-value hash indexes are used to optimize common lookup patterns on a JSON column for the following two fields:
-
An integer field eventId
-
A string array tags
These indexes help speed up queries that filter by eventID and check for specific tags without scanning the entire table.
-
Create a table.
CREATE TABLE events (id BIGINT PRIMARY KEY,created DATETIME(6),payload JSON); -
Add multi-value hash indexes via
ALTER TABLE
.ALTER TABLE eventsADD MULTI VALUE INDEX idx_event_id (payload)INDEX_OPTIONS = '{"TOKENIZER": "MATCH ANY", "PATH": ["eventId"]}';ALTER TABLE eventsADD MULTI VALUE INDEX idx_tags (payload)INDEX_OPTIONS = '{"TOKENIZER": "MATCH ANY", "PATH": ["tags"]}'; -
Insert data into the table.
INSERT INTO events VALUES(1, NOW(6), JSON_BUILD_OBJECT('eventId', 101, 'tags', JSON_BUILD_ARRAY('sports', 'outdoor', 'fun'))),(2, NOW(6), JSON_BUILD_OBJECT('eventId', 202, 'tags', JSON_BUILD_ARRAY('music', 'concert'))),(3, NOW(6), JSON_BUILD_OBJECT('eventId', 101, 'tags', JSON_BUILD_ARRAY('sports', 'indoor'))); -
Query using both the indexed fields:
SELECT *FROM eventsWHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = 101:>JSON,payload, 'eventId')AND JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('sports'),payload, 'tags');+----+----------------------------+---------------------------------------------------+ |id | created | payload | +----+----------------------------+---------------------------------------------------+ | 1 | 2025-04-29 15:36:52.797432 | {"eventId":101,"tags":["sports","outdoor","fun"]} | | 3 | 2025-04-29 15:36:52.797432 | {"eventId":101,"tags":["sports","indoor"]} | +----+----------------------------+---------------------------------------------------+
-
Verify index usage from the query
EXPLAIN
:EXPLAIN SELECT *FROM eventsWHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = 101:>JSON,payload, 'eventId')AND JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('sports'),payload, 'tags');+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Gather partitions:all alias:remote_0 parallelism_level:segment | | Project [events.id, events.created, events.payload] | | ColumnStoreFilter [JSON_MATCH_ANY(MATCH_PARAM_JSON() = (101:>JSON COLLATE utf8mb4_bin NULL),events.payload,'eventId') multi-value hash index AND JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('sports'),events.payload,'tags') multi-value hash index] | | ColumnStoreScan db1.events, SORT KEY __UNORDERED () table_type:sharded_columnstore | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Last modified: June 6, 2025