Multi-Value Hash Index (BSON)
Warning
SingleStore 9.0 gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 8.9 is recommended for production workloads, which can later be upgraded to SingleStore 9.0.
On this page
Multi-value hash index (or "multi-value index") maps values to the row IDs containing the value.
Multi-value index stores pairs consisting of a value and a list of row IDs in which the value occurs.
SingleStore's multi-value index is fundamentally a hash index and is built on the same effective technology as the regular columnar hash indices.
BSON type includes extending BSON_
with equality filters inside to be optimized using multi-value index.
-
BSON_
MATCH_ ANY(MATCH_ PARAM_ BSON() = 'value', 'path') -
BSON_
MATCH_ ANY(MATCH_ PARAM_ BSON() IN ( 'value1', 'value2', 'value3'), 'path')
Syntax
For SQL Queries
Adding multi-value index to new tables:
CREATE TABLE <tablename>(<col1> BSON NULL,MULTI VALUE INDEX(col1) INDEX_OPTIONS='<options>';
Adding multi-value index to existing tables:
ALTER TABLE <tableName>ADD MULTI VALUE INDEX (col1) INDEX_OPTIONS='<options>';
where '<options>'
is a JSON object to specify the tokenizer and other metadata related to it.
'{"tokenizer":"MATCH_ANY", "path":[<path_array>]}'
where path_
is a comma-separated list of values that specify the path to the field that needs to be indexed."PATH":[]
, is specified, then top-level elements in the array are indexed (such as [1,2,3]).
For MongoDB® Queries
db.<collection>.createIndex({"<path_name>":"multi_value"})
where <path_
is the path to a property in the BSON document.
Remarks
-
Currently, multi-value index is supported only on BSON type columns.
-
Multi-value index is case-sensitive (for BSON types).
-
Multi-value index activates when matched using the following operators:
=
,<=>
, andIN()
.The following table lists the expressions that support multi-value index for each of these operators:
Operator
Supported Expression
=
,<=>
-
Constant literals, for example strings,
NULL
,DOUBLE
,INT
, etc. -
Typecast operators
:>
and!:>
, for example,100001:>BSON
-
User-defined variables (UDVs), for example:
SET @obj = '{"city": "New York"}':>BSON;SELECT a:>JSON FROM tWHERE BSON_MATCH_ANY(MATCH_PARAM_BSON()=@obj, a, 'city'); -
Procedural SQL variables (stored procedure arguments and local variables)
-
Nested constant (deterministic) built-in expressions, for example:
... BSON_BUILD_ARRAY(BSON_SET_BSON('{}', 'a', @bson), '{"a":4}':>JSON)
-
A combination of all of the above
IN()
-
Hexadecimal literal of type X'.
. . ' in non-parameterized IN
lists, for example:SELECT ... BSON_MATCH_ANY(MATCH_PARAM_BSON() IN ( X'D204000010', X'A186010010'), product_ids, 'product_type') -
Parameterized
IN
lists with:-
Literals of the same types
-
Single argument built-in expressions of the same shape and literal types in the list, for example,
IN(HEX('a'), HEX('b'))
-
-
Unsupported Features or Expressions
Multi-value index is not activated when used in the following:
-
Calls to user-defined functions (UDFs)
-
References to other unindexed (not a multi-value index) fields on the right-hand side of the expression
-
Non-deterministic built-in functions, for example
RAND()
-
Aggregate and window functions
-
MATCH_
expressions other than BSON type on the right-hand side, because it's evaluated at runtimePARAM_ <type> -
BSON_
predicate withMATCH_ ANY MATCH_
optionELEMENTS -
!=
and other comparison operators (excluding=
and<=>
) -
NOT IN
lists
Examples
Example 1 - Using SQL
Consider the following table named orders.
CREATE TABLE orders(id BIGINT PRIMARY KEY,created DATETIME(6),product_ids BSON,MULTI VALUE INDEX (product_ids) INDEX_OPTIONS='{"TOKENIZER":"MATCH_ANY","PATH":[]}');INSERT INTO orders VALUES(1, '2025-03-03 12:34:56.000001', '100001':>BSON),(2, '2025-03-03 12:34:56.000002', '100002':>BSON),(3, '2025-03-03 12:34:56.000003', '100003':>BSON),(4, '2025-03-03 12:34:56.000004', '100004':>BSON);
Optimize the table:
OPTIMIZE TABLE orders FULL;
Perform the lookup:
SELECT id, created, product_ids:>JSON AS product_idsFROM ordersWHERE BSON_MATCH_ANY(MATCH_PARAM_BSON() = 100001:>BSON, product_ids);
+----+----------------------------+-------------+
| id | created | product_ids |
+----+----------------------------+-------------+
| 1 | 2025-03-03 12:34:56.000001 | 100001 |
+----+----------------------------+-------------+
The product_
column is cast to JSON for clarity.
The query execution benefits from the multi-value index, note the ColumnStoreFilter .
in the output:
EXPLAIN SELECT id, created, product_ids:>JSON AS product_idsFROM ordersWHERE BSON_MATCH_ANY(MATCH_PARAM_BSON() = 100001:>BSON, product_ids);
+------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0 parallelism_level:segment |
| Project [orders.id, orders.created, (orders.product_ids:>JSON COLLATE utf8mb4_bin NULL) AS product_ids] |
| ColumnStoreFilter [BSON_MATCH_ANY(MATCH_PARAM_BSON() = (100001:>bson NULL),orders.product_ids) multi-value hash index] |
| ColumnStoreScan dbTest.orders, SORT KEY __UNORDERED () table_type:sharded_columnstore |
+------------------------------------------------------------------------------------------------------------------------+
Example 2 - Using Multi-Value Index at Scale
The following example shows how a large data set can benefit from multi-value index at scale.
Create a table bookings, and add multi-value index to the product_
CREATE TABLE bookings(id BIGINT PRIMARY KEY,created DATETIME(6),product_ids BSON,MULTI VALUE INDEX (product_ids) INDEX_OPTIONS='{"TOKENIZER":"MATCH_ANY","PATH":[]}');-- Add 1Million rows, each with 5 products sampled from 100000 product idsDELIMITER //DO DECLAREarr ARRAY(RECORD(id BIGINT, created DATETIME(6), product_ids BSON)) =CREATE_ARRAY(1000000);product_ids BSON;n BIGINT;BEGINFOR i IN 0..999999 LOOPproduct_ids = CONCAT('[',CONCAT_WS(',', (RAND()*100000):>INT,(RAND()*100000):>INT, (RAND()*100000):>INT,(RAND()*100000):>INT, (RAND()*100000):>INT),']');arr[i] = ROW(i, NOW(), product_ids);END LOOP;n = INSERT_ALL('bookings', arr);END //DELIMITER ;
Optimize the table:
OPTIMIZE TABLE bookings FULL;
Perform the lookup:
SELECT id, created, product_ids:>JSON AS product_idsFROM bookingsWHERE BSON_MATCH_ANY(MATCH_PARAM_BSON() = 94567:>BSON, product_ids);
+--------+----------------------------+---------------------------------+
| id | created | product_ids |
+--------+----------------------------+---------------------------------+
| 719886 | 2025-03-03 12:50:49.000000 | [77807,55647,59828,9374,94567] |
| 500322 | 2025-03-03 12:50:49.000000 | [94567,58653,77465,55399,16804] |
| 207632 | 2025-03-03 12:50:48.000000 | [63421,19400,16959,65091,94567] |
| 943351 | 2025-03-03 12:50:50.000000 | [4616,41546,31047,94567,68472] |
| 363185 | 2025-03-03 12:50:48.000000 | [50258,88567,99311,94567,2009] |
| 803463 | 2025-03-03 12:50:49.000000 | [22582,64484,15540,44440,94567] |
| 698599 | 2025-03-03 12:50:49.000000 | [84654,32124,94567,26840,90261] |
| 105530 | 2025-03-03 12:50:47.000000 | [57273,21771,31151,4171,94567] |
| 781873 | 2025-03-03 12:50:49.000000 | [2862,12649,71063,90341,94567] |
--- Output truncated ---
+--------+----------------------------+---------------------------------+
The product_
column is cast to JSON for clarity.
The query execution benefits from the multi-value index (the ColumnStoreFilter .
in the output indicates that the query used multi-value index):
EXPLAIN SELECT id, created, product_ids:>JSON AS product_idsFROM bookingsWHERE BSON_MATCH_ANY(MATCH_PARAM_BSON() = 94567:>BSON, product_ids);
+-------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0 parallelism_level:segment |
| Project [bookings.id, bookings.created, (bookings.product_ids:>JSON COLLATE utf8mb4_bin NULL) AS product_ids] |
| ColumnStoreFilter [BSON_MATCH_ANY(MATCH_PARAM_BSON() = (94567:>bson NULL),bookings.product_ids) multi-value hash index] |
| ColumnStoreScan dbTest.bookings, SORT KEY __UNORDERED () table_type:sharded_columnstore |
+-------------------------------------------------------------------------------------------------------------------------+
Last modified: June 5, 2025