SingleStore DB

DOT_PRODUCT

Returns the scalar product, or dot product, of two vector input values.

Syntax
DOT_PRODUCT(vector_expression, vector_expression)
Arguments
  • vector_expression: An expression that evaluates to a vector. The vector must be encoded as a blob containing packed single-precision or double-precision floating-point numbers in little-endian byte order.

Remarks

DOT_PRODUCT() can be used to compute a cosine similarity metric of the two input vectors, if the input vectors are normalized to length 1.

A vector can be of any length, but all input vectors must be the same length and the blob lengths must be divisible by the packed vector element size (1, 2, 4 or 8 bytes, depending on the vector element).

If the result of DOT_PRODUCT() is infinity, negative infinity, or not a number (NaN), NULL will be returned instead.

To compute the dot product against multiple vector inputs, use UDFs or cross-joins.

You can specify the datatype of the vector elements in which this operation is performed on the vector by adding a suffix to the function. Omitting the suffix from the function is equivalent to suffixing it with _F32. All operations are done using the specified datatype. The following table lists the suffixes and their datatype.

Suffix

Datatype

_I8

8-bit signed integer

_I16

16-bit signed integer

_I32

32-bit signed integer

_I64

64-bit signed integer

_F32

32-bit floating-point number (IEEE standard format)

_F64

64-bit floating-point number (IEEE standard format)

Examples

Example: SELECT Using DOT_PRODUCT() on Existing Rows

The following example executes DOT_PRODUCT() on a row containing two vectors. The HEX() built-in function is also used to return a readable form of the binary output.

Create a table with two BLOB-typed columns:

CREATE TABLE t (a BLOB, b BLOB);

Using the JSON_ARRAY_PACK() built-in function to easily insert properly formatted vectors, insert a row with each vector in a different column:

INSERT INTO t VALUES (JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));

To demonstrate the contents of the table, use the HEX() built-in function to return a readable form of the binary data:

SELECT HEX(t.a), HEX(t.b) FROM t;
****
+--------------------------+--------------------------+
| HEX(t.a)                 | HEX(t.b)                 |
+--------------------------+--------------------------+
| 3333333FCDCC4C3E9A99D93F | 0000803F0000003F00000040 |
+--------------------------+--------------------------+

You can also use the JSON_ARRAY_UNPACK() function to return the table elements in JSON format:

SELECT JSON_ARRAY_UNPACK(t.a), JSON_ARRAY_UNPACK(t.b) FROM t;
****
+--------------------------------------+------------------------+
| JSON_ARRAY_UNPACK(t.a)               | JSON_ARRAY_UNPACK(t.b) |
+--------------------------------------+------------------------+
| [0.699999988,0.200000003,1.70000005] | [1,0.5,2]              |
+--------------------------------------+------------------------+

Query the table using the DOT_PRODUCT() function in a SELECT statement:

SELECT DOT_PRODUCT(t.a, t.b) FROM t;
****
+-----------------------+
| DOT_PRODUCT(t.a, t.b) |
+-----------------------+
|     4.200000084936619 |
+-----------------------+

Example: DOT_PRODUCT() with JSON_ARRAY_PACK()

The JSON_ARRAY_PACK() built-in function makes it easier to add properly-formatted vectors as input parameters to the DOT_PRODUCT() function. Consider the example below:

SELECT DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));
****
+-------------------------------------------------------------------------------------+
| DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]')) |
+-------------------------------------------------------------------------------------+
|                                                                                5.25 |
+-------------------------------------------------------------------------------------+

Example: Floating Point Deviation in DOT_PRODUCT() Function

The DOT_PRODUCT function accepts FLOAT data type as an input and returns a DOUBLE. The output does not have exact DOUBLE precision because of the FLOAT input.

Note: It is advisable not to directly compare floating point values for equality.

For example, two vectors are provided as input parameters to the DOT_PRODUCT() function.

INSERT INTO t1 VALUES(JSON_ARRAY_PACK('[0.6,0.8]'), JSON_ARRAY_PACK('[0.6,0.8]'));
SELECT DOT_PRODUCT(c1, c2) FROM t1;
****
+---------------------+
| DOT_PRODUCT(c1, c2) |
+---------------------+
|  1.0000000596046448 |
+---------------------+

In this example, the output is not 1.0 as would be expected. The output 1.0 would be possible only if the input vector values are perfectly representable in floating point.

Example: Computing A Large Number of Dot Products Efficiently, When there are Repeating Inputs

Suppose x and y are lists of vectors, where x contains a large number of items. For each vector in x, you want to calculate the dot product for each vector in y, where the vector in x is the first dot product input and the vector in y is the second dot product input. To compute this efficiently, you can separate x and y into two tables perform a cross join. For example,

SELECT x.vector, DOT_PRODUCT(x.vector, y.vector)
FROM x,y;

When DOT_PRODUCT(x.vector, y.vector) is calculated using a vector in x.vector for each vector in y.vector, the vector in x.vector is loaded from memory to CPU only once.

Example: Typecasting the Datatype Using Suffixes

To perform this operation on vectors specified in a format other than the default 32-bit floating point number, specify a suffix with the DOT_PRODUCT() function. For example,

SELECT DOT_PRODUCT_I16(JSON_ARRAY_PACK_I16('[12,10]'),JSON_ARRAY_PACK_I16('[8,6]')) "Result";
****
+--------+
| Result |
+--------+
|    156 |
+--------+