DOT_ PRODUCT
On this page
Returns the scalar product, or dot product, of two vector input values.
It is a scalar function.
Syntax
DOT_PRODUCT(vector_expression, vector_expression)
Arguments
-
vector_
: An expression that evaluates to a vector.expression The vector must be encoded as a blob containing packed single-precision or double-precision floating-point numbers in little-endian byte order.
Remarks
DOT_
can be used to compute a cosine similarity metric of the two input vectors, if the input vectors are normalized to length 1.
To execute this function, the host processor must support AVX2 instruction set extensions.
If the result of DOT_
is infinity, negative infinity, or not a number (NaN), NULL
will be returned instead.
To compute the dot product against multiple vector inputs, use UDFs or cross-joins.
Examples
Example: SELECT Using DOT_
The following example executes DOT_
on a row containing two vectors.HEX()
built-in function is also used to return a readable form of the binary output.
Create a table with two BLOB
-typed columns:
CREATE TABLE dp_t (a BLOB, b BLOB);
Using the JSON_
INSERT INTO dp_t VALUES (JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));
To demonstrate the contents of the table, use the HEX()
built-in function to return a readable form of the binary data:
SELECT HEX(dp_t.a), HEX(dp_t.b) FROM dp_t;
+--------------------------+--------------------------+
| HEX(dp_t.a) | HEX(dp_t.b) |
+--------------------------+--------------------------+
| 3333333FCDCC4C3E9A99D93F | 0000803F0000003F00000040 |
+--------------------------+--------------------------+
Query the table using the DOT_
function in a SELECT
statement:
SELECT DOT_PRODUCT(dp_t.a, dp_t.b) FROM dp_t;
+-----------------------------+
| DOT_PRODUCT(dp_t.a, dp_t.b) |
+-----------------------------+
| 4.200000084936619 |
+-----------------------------+
Example: DOT_
The JSON_
built-in function makes it easier to add properly-formatted vectors as input parameters to the DOT_
function.
SELECT DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));
+-------------------------------------------------------------------------------------+
| DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]')) |
+-------------------------------------------------------------------------------------+
| 5.25 |
+-------------------------------------------------------------------------------------+
Example: Floating Point Deviation in DOT_
The DOT_
function accepts FLOAT
data type as an input and returns a DOUBLE
.DOUBLE
precision because of the FLOAT
input.
Note: It is advisable not to directly compare floating point values for equality.
For example, two vectors are provided as input parameters to the DOT_
function.
INSERT INTO dp_t1 VALUES(JSON_ARRAY_PACK('[0.6,0.8]'), JSON_ARRAY_PACK('[0.6,0.8]'));SELECT DOT_PRODUCT(a, b) FROM dp_t1;
+---------------------+
| DOT_PRODUCT(a, b) |
+---------------------+
| 1.0000000596046448 |
+---------------------+
In this example, the output is not 1.
as would be expected.1.
would be possible only if the input vector values are perfectly representable in floating point.
Example: Computing A Large Number of Dot Products Efficiently, When there are Repeating Inputs
Suppose x
and y
are lists of vectors, where x
contains a large number of items.x
, you want to calculate the dot product for each vector in y
, where the vector in x
is the first dot product input and the vector in y
is the second dot product input.x
and y
into two tables perform a cross join.
SELECT x.vector, DOT_PRODUCT(x.vector, y.vector)FROM x,y;
When DOT_
is calculated using a vector in x.
for each vector in y.
, the vector in x.
is loaded from memory to CPU only once.
Related Topics
Last modified: June 6, 2023