Multi-Value Hash Index (JSON)

Multi-Value Hash index optimizes lookups within JSON objects and arrays. It can efficiently find rows where a JSON array contains a specific value or where a specific key-path holds an exact value match.

For example, if an orders table has a JSON column details with the following data:

{"product_id":[{"id":1001,"type":"flour"},{"id":1002,"type":"yeast"}]}

An index created on the details column at the path ["product_id", "id"] enables high performing lookups for orders by product_id.

To index values at the top level in an array, specify an empty path array, such as, "PATH":[]. For example, if you have a column containing a top-level JSON array like [1,2,3], you can accelerate searches for values within the array.

The multi-value hash index stores pairs of values along with the list of row IDs where that value occurs. Since each value is stored only once, the index remains compact even when values are repeated across many rows.

This index is fundamentally a hash index, leveraging the same high-performance technology as the regular columnar hash indexes.

Syntax

-- Adding the multi-value index to a new table --
CREATE TABLE <tablename>   
(<col> JSON NULL,    
MULTI VALUE INDEX(col) INDEX_OPTIONS='<options>';
-- Adding the multi-value index an to existing table column --
ALTER TABLE <tableName>
ADD MULTI VALUE INDEX (col) INDEX_OPTIONS='<options>';

where <options> is a JSON object to specify the tokenizer and other metadata related to it.

'{"TOKENIZER":"MATCH_ANY", "PATH":[<path_array>]}'

where <path_array> is a comma-separated list of values that specify the path to the field that needs to be indexed. If an empty path array, such as, "PATH":[], is specified, then the top-level elements in the array are indexed (such as [1,2,3]).

Using the Multi-Value Hash Index

The JSON_MATCH_ANY() built-in function is used to compare values inside a JSON array and to extract values from a JSON object. You can match values that are valid JSON types (object, array, integer, float, string, boolean, or JSON null) within a JSON document.

Note

Currently, multi-value hash Index for JSON supports matching only via the MATCH_PARAM_JSON() expression.

You can also match scalar values (such as strings or doubles) inside the JSON object. However, ensure that string values are valid JSON strings wrapped in quotes. Use either of the following:

  • Using the TO_JSON('<str>') built-in function.

  • Manually specifying a quoted constant string literal, e.g., ... = '"address"'.

Note: Typecasting to JSON using :>JSON does not automatically wrap an unquoted string into a valid JSON string. (However, it works for numeric types, for example, 2.5:>JSON.)

Supported Query Shapes

JSON_MATCH_ANY(MATCH_PARAM_JSON() = <value>, <key_path>)
JSON_MATCH_ANY(MATCH_PARAM_JSON() <=> <value>, <key_path>)

JSON_MATCH_ANY(MATCH_PARAM_JSON() IN (<value1>, <value2>..), '<key_path>')

<value> can be a constant literal (such as a string, integer, float), a typecast operator (:>, !:>), a user-defined variable (UDV), a deterministic built-in expression (most built-in functions, except for non-deterministic ones like UUID() or RAND()), or a combination of these.

In general, the lookup value must remain constant during query execution. If the value changes per row, such as a column reference, then the index cannot be effectively used and the query falls back to a regular table scan.

The index is only used for queries that match the exact <key_path> defined during index creation. For example, if the index is created on addresses::city, the query benefits from the index only when filtering values specifically under this path. It is not applied to deeper paths like addresses::city::district or to sibling keys like addresses::zip. To apply the index to any other path, you must create a separate index.

To check whether the index was used for your query filter, run an EXPLAIN query - the phrase multi-value hash index in the predicate under ColumnStoreFilter indicates that the index was used..

EXPLAIN SELECT col FROM t
WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('type'), col, 'key1');
+--------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                      |
|+-------------------------------------------------------------------------------------------------------------+
| Project [t.col]                                                                                              |
| ColumnStoreFilter [JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('type'),t.col,'key1') multi-value hash index] |
| ColumnStoreScan db.t, SORT KEY __UNORDERED () table_type:sharded_columnstore                                 |
|+-------------------------------------------------------------------------------------------------------------+

Remarks

  • Multi-value hash index is case-sensitive for JSON types.

  • Multi-value hash index for JSON activates when matched using the following operators: =, <=>, and IN().

  • The following table lists the expressions that support Multi-Value Hash Index for each of these operators:

    Operator

    Supported Expression

    =, <=>

    • Constant literals, for example strings, NULL, DOUBLE, INT, etc.

    • Typecast operators :> and !:>, for example, 100001:>JSON.

    • User-defined variables (UDVs), for example:

      SET @obj = '{"city": "New York"}':>JSON;
      SELECT a:>JSON FROM t
      WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON()=@obj, a, 'city');
    • Procedural SQL variables (stored procedure arguments and local variables).

    • Nested constant (deterministic) built-in expressions, for example:

      JSON_BUILD_ARRAY(JSON_SET_JSON('{}', 'a', @json), '{"a":4}':>JSON)
    • A combination of all of the above.

    IN()

    • Parameterized IN lists with:

      • Literals of the same types

      • Single argument built-in expressions of the same shape and literal types in the list, for example, IN(HEX('a'), HEX('b')).

Limitations

  • Only runtime constant expressions are supported for matching (i.e. UDV, builtins, typecasts, PSQL variables are supported; UDFs and column fields are not supported).

  • Using OR inside a JSON_MATCH_ANY predicate (for example, JSON_MATCH_ANY(MATCH_PARAM_JSON() = 12 OR MATCH_PARAM_JSON() = 1, a, 'a')) disables index usage. Instead, use either an IN list or multiple JSON_MATCH_ANY filters.

  • Only parameterized IN list is supported.

  • Indexing multiple key paths for the same JSON field requires separate indexes, for example, to index both a::k1 and a::k2, you must create two indexes.

  • This index is supported only for columnstore tables.

  • Multi-value index for JSON does not support the following:

    • References to other unindexed fields on the right-hand side of an expression.

    • Non-deterministic built-in functions, for example RAND().

    • JSON_MATCH_ANY predicate with MATCH_ELEMENTS option.

    • != and other comparison operators (excluding = and <=>).

    • NOT IN lists.

Examples

Example 1- Searching users by city

Consider the following table definition:

CREATE TABLE users (    
  id BIGINT PRIMARY KEY AUTO_INCREMENT,    
  created DATETIME(6) DEFAULT NOW(6),    
  data JSON
);

User information is stored in the data column as JSON, structured as follows:

** row-1  
  {    
    "id": 1,    
    "name": "Alice",    
    "addresses": [      
      { "street": "123 Main St", "city": "New York", "zip": "10001" },      
      { "street": "456 Oak Ave", "city": "Los Angeles", "zip": "90001" }    
     ]  
  }
** row-2  
  {    
    "id": 2,    
    "name": "Bob",    
    "addresses": [      
     { "street": "789 Pine Ln", "city": "Chicago", "zip": "60601" }    
    ]  
}

** row-3  
  {    
    "id": 3,    
    "name": "Charlie",    
    "addresses": [      
     { "street": "101 Elm St", "city": "New York", "zip": "10002" },      
     { "street": "222 Maple Dr", "city": "Houston", "zip": "77001" }    
   ]  
}

Adding a multi-value Hash Index to the path "addresses, city" can significantly improve lookup performance.

ALTER TABLE users ADD MULTI VALUE INDEX (data)    
   INDEX_OPTIONS='{"TOKENIZER":"MATCH ANY","PATH":["addresses", "city"]}';

Find all users who live in Houston using the JSON_MATCH_ANY built-in function.

SELECT id, data::name FROM users
WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON()='"Houston"', data, 'addresses', 'city'); 
+----+------------+
| id | data::name |
+----+------------+
|  3 | "Charlie"  |
+----+------------+

To verify that the multi-value hash index was used, run the EXPLAIN query as:

EXPLAIN SELECT id, data::name FROM users
WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON()='"Houston"', data, 'addresses', 'city');
+-----------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                     |
+-----------------------------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0 parallelism_level:segment                                                              |
| Project [users.id, JSON_EXTRACT_JSON(users.data,'name') AS `data::name`]                                                    |
| ColumnStoreFilter [JSON_MATCH_ANY(MATCH_PARAM_JSON() = '\"Houston\"',users.data,'addresses','city') multi-value hash index] |
| ColumnStoreScan db1.users, SORT KEY __UNORDERED () table_type:sharded_columnstore                                           |
+-----------------------------------------------------------------------------------------------------------------------------+

Example 2 - Multiple indexes

In this example, two separate multi-value hash indexes are used to optimize common lookup patterns on a JSON column for the following two fields:

  • An integer field eventId

  • A string array tags

These indexes help speed up queries that filter by eventID and check for specific tags without scanning the entire table.

  1. Create a table.

    CREATE TABLE events (
    id BIGINT PRIMARY KEY,
    created DATETIME(6),
    payload JSON
    );
  2. Add multi-value hash indexes via ALTER TABLE.

    ALTER TABLE events
    ADD MULTI VALUE INDEX idx_event_id (payload)
    INDEX_OPTIONS = '{"TOKENIZER": "MATCH ANY", "PATH": ["eventId"]}';
    ALTER TABLE events
    ADD MULTI VALUE INDEX idx_tags (payload)
    INDEX_OPTIONS = '{"TOKENIZER": "MATCH ANY", "PATH": ["tags"]}';
  3. Insert data into the table.

    INSERT INTO events VALUES
    (1, NOW(6), JSON_BUILD_OBJECT('eventId', 101, 'tags', JSON_BUILD_ARRAY('sports', 'outdoor', 'fun'))),
    (2, NOW(6), JSON_BUILD_OBJECT('eventId', 202, 'tags', JSON_BUILD_ARRAY('music', 'concert'))),
    (3, NOW(6), JSON_BUILD_OBJECT('eventId', 101, 'tags', JSON_BUILD_ARRAY('sports', 'indoor')));
  4. Query using both the indexed fields:

    SELECT *
    FROM events
    WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = 101:>JSON,
    payload, 'eventId')
    AND JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('sports'),
    payload, 'tags');
    +----+----------------------------+---------------------------------------------------+
    |id  | created                    | payload                                           |
    +----+----------------------------+---------------------------------------------------+
    |  1 | 2025-04-29 15:36:52.797432 | {"eventId":101,"tags":["sports","outdoor","fun"]} |
    |  3 | 2025-04-29 15:36:52.797432 | {"eventId":101,"tags":["sports","indoor"]}        |
    +----+----------------------------+---------------------------------------------------+
  5. Verify index usage from the query EXPLAIN:

    EXPLAIN SELECT *
    FROM events
    WHERE JSON_MATCH_ANY(MATCH_PARAM_JSON() = 101:>JSON,
    payload, 'eventId')
    AND JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('sports'),
    payload, 'tags');
    +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | EXPLAIN                                                                                                                                                                                                                                               |
    +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Gather partitions:all alias:remote_0 parallelism_level:segment                                                                                                                                                                                        |
    | Project [events.id, events.created, events.payload]                                                                                                                                                                                                   |
    | ColumnStoreFilter [JSON_MATCH_ANY(MATCH_PARAM_JSON() = (101:>JSON COLLATE utf8mb4_bin NULL),events.payload,'eventId') multi-value hash index AND JSON_MATCH_ANY(MATCH_PARAM_JSON() = TO_JSON('sports'),events.payload,'tags') multi-value hash index] |
    | ColumnStoreScan db1.events, SORT KEY __UNORDERED () table_type:sharded_columnstore                                                                                                                                                                    |
    +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Last modified: June 6, 2025

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK