DOT_PRODUCT

Returns the scalar product, or dot product, of two vector input values.

It is a scalar function.

Syntax

DOT_PRODUCT(vector_expression, vector_expression)

Arguments

  • vector_expression: An expression that evaluates to a vector. The vector must be encoded as a blob containing packed single-precision or double-precision floating-point numbers in little-endian byte order.

Remarks

DOT_PRODUCT() can be used to compute a cosine similarity metric of the two input vectors, if the input vectors are normalized to length 1.

To execute this function, the host processor must support AVX2 instruction set extensions. If AVX2 is not supported, an error will occur during execution.

If the result of DOT_PRODUCT() is infinity, negative infinity, or not a number (NaN), NULL will be returned instead.

To compute the dot product against multiple vector inputs, use UDFs or cross-joins.

Examples

Example: SELECT Using DOT_PRODUCT() on Existing Rows

The following example executes DOT_PRODUCT() on a row containing two vectors. The HEX() built-in function is also used to return a readable form of the binary output.

Create a table with two BLOB-typed columns:

CREATE TABLE dp_t (a BLOB, b BLOB);

Using the JSON_ARRAY_PACK() built-in function to easily insert properly formatted vectors, insert a row with each vector in a different column:

INSERT INTO dp_t VALUES (JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));

To demonstrate the contents of the table, use the HEX() built-in function to return a readable form of the binary data:

SELECT HEX(dp_t.a), HEX(dp_t.b) FROM dp_t;
+--------------------------+--------------------------+
| HEX(dp_t.a)              | HEX(dp_t.b)              |
+--------------------------+--------------------------+
| 3333333FCDCC4C3E9A99D93F | 0000803F0000003F00000040 |
+--------------------------+--------------------------+

Query the table using the DOT_PRODUCT() function in a SELECT statement:

SELECT DOT_PRODUCT(dp_t.a, dp_t.b) FROM dp_t;
+-----------------------------+
| DOT_PRODUCT(dp_t.a, dp_t.b) |
+-----------------------------+
|           4.200000084936619 |
+-----------------------------+

Example: DOT_PRODUCT() with JSON_ARRAY_PACK()

The JSON_ARRAY_PACK() built-in function makes it easier to add properly-formatted vectors as input parameters to the DOT_PRODUCT() function. Consider the example below:

SELECT DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));
+-------------------------------------------------------------------------------------+
| DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]')) |
+-------------------------------------------------------------------------------------+
|                                                                                5.25 |
+-------------------------------------------------------------------------------------+

Example: Floating Point Deviation in DOT_PRODUCT() Function

The DOT_PRODUCT function accepts FLOAT data type as an input and returns a DOUBLE. The output does not have exact DOUBLE precision because of the FLOAT input.

Note: It is advisable not to directly compare floating point values for equality.

For example, two vectors are provided as input parameters to the DOT_PRODUCT() function.

INSERT INTO dp_t1 VALUES(JSON_ARRAY_PACK('[0.6,0.8]'), JSON_ARRAY_PACK('[0.6,0.8]'));
SELECT DOT_PRODUCT(a, b) FROM dp_t1;
+---------------------+
| DOT_PRODUCT(a, b)   |
+---------------------+
|  1.0000000596046448 |
+---------------------+

In this example, the output is not 1.0 as would be expected. The output 1.0 would be possible only if the input vector values are perfectly representable in floating point.

Example: Computing A Large Number of Dot Products Efficiently, When there are Repeating Inputs

Suppose x and y are lists of vectors, where x contains a large number of items. For each vector in x, you want to calculate the dot product for each vector in y, where the vector in x is the first dot product input and the vector in y is the second dot product input. To compute this efficiently, you can separate x and y into two tables perform a cross join. For example,

SELECT x.vector, DOT_PRODUCT(x.vector, y.vector)
FROM x,y;

When DOT_PRODUCT(x.vector, y.vector) is calculated using a vector in x.vector for each vector in y.vector, the vector in x.vector is loaded from memory to CPU only once.

Last modified: June 6, 2023

Was this article helpful?