DOT_ PRODUCT
On this page
Returns the scalar product, or dot product, of two vector input values.
It is a scalar function.
Syntax
DOT_PRODUCT(vector_expression, vector_expression)
Arguments
-
vector_
: An expression that evaluates to a vector.expression The vector must be encoded as a blob containing packed single-precision or double-precision floating-point numbers in little-endian byte order.
Remarks
DOT_
can be used to compute a cosine similarity metric of the two input vectors, if the input vectors are normalized to length 1.
A vector can be of any length, but the input blob length must be divisible by the packed vector element size (1, 2, 4 or 8 bytes, depending on the vector element).
If the result of DOT_
is infinity, negative infinity, or not a number (NaN), NULL
will be returned instead.
To compute the dot product against multiple vector inputs, use UDFs or cross-joins.
You can specify the datatype of the vector elements in which this operation is performed on the vector by adding a suffix to the function._
.
Suffix |
Datatype |
---|---|
|
8-bit signed integer |
|
16-bit signed integer |
|
32-bit signed integer |
|
64-bit signed integer |
|
32-bit floating-point number (IEEE standard format) |
|
64-bit floating-point number (IEEE standard format) |
Examples
Example: SELECT Using DOT_
The following example executes DOT_
on a row containing two vectors.HEX()
built-in function is also used to return a readable form of the binary output.
Create a table with two BLOB
-typed columns:
CREATE TABLE dp_t (a BLOB, b BLOB);
Using the JSON_
INSERT INTO dp_t VALUES (JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));
To demonstrate the contents of the table, use the HEX()
built-in function to return a readable form of the binary data:
SELECT HEX(dp_t.a), HEX(dp_t.b) FROM dp_t;
+--------------------------+--------------------------+
| HEX(dp_t.a) | HEX(dp_t.b) |
+--------------------------+--------------------------+
| 3333333FCDCC4C3E9A99D93F | 0000803F0000003F00000040 |
+--------------------------+--------------------------+
You can also use the JSON_
function to return the table elements in JSON format:
SELECT JSON_ARRAY_UNPACK(dp_t.a), JSON_ARRAY_UNPACK(dp_t.b) FROM dp_t;
+--------------------------------------+---------------------------+
| JSON_ARRAY_UNPACK(dp_t.a) | JSON_ARRAY_UNPACK(dp_t.b) |
+--------------------------------------+---------------------------+
| [0.699999988,0.200000003,1.70000005] | [1,0.5,2] |
+--------------------------------------+---------------------------+
Query the table using the DOT_
function in a SELECT
statement:
SELECT DOT_PRODUCT(dp_t.a, dp_t.b) FROM dp_t;
+-----------------------------+
| DOT_PRODUCT(dp_t.a, dp_t.b) |
+-----------------------------+
| 4.200000084936619 |
+-----------------------------+
Example: DOT_
The JSON_
built-in function makes it easier to add properly-formatted vectors as input parameters to the DOT_
function.
SELECT DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));
+-------------------------------------------------------------------------------------+
| DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]')) |
+-------------------------------------------------------------------------------------+
| 5.25 |
+-------------------------------------------------------------------------------------+
Example: Floating Point Deviation in DOT_
The DOT_
function accepts FLOAT
data type as an input and returns a DOUBLE
.DOUBLE
precision because of the FLOAT
input.
Note: It is advisable not to directly compare floating point values for equality.
For example, two vectors are provided as input parameters to the DOT_
function.
INSERT INTO dp_t1 VALUES(JSON_ARRAY_PACK('[0.6,0.8]'), JSON_ARRAY_PACK('[0.6,0.8]'));SELECT DOT_PRODUCT(a, b) FROM dp_t1;
+---------------------+
| DOT_PRODUCT(a, b) |
+---------------------+
| 1.0000000596046448 |
+---------------------+
In this example, the output is not 1.
as would be expected.1.
would be possible only if the input vector values are perfectly representable in floating point.
Example: Computing A Large Number of Dot Products Efficiently, When there are Repeating Inputs
Suppose x
and y
are lists of vectors, where x
contains a large number of items.x
, you want to calculate the dot product for each vector in y
, where the vector in x
is the first dot product input and the vector in y
is the second dot product input.x
and y
into two tables perform a cross join.
SELECT x.vector, DOT_PRODUCT(x.vector, y.vector)FROM x,y;
When DOT_
is calculated using a vector in x.
for each vector in y.
, the vector in x.
is loaded from memory to CPU only once.
Example: Typecasting the Datatype Using Suffixes
To perform this operation on vectors specified in a format other than the default 32-bit floating point number, specify a suffix with the DOT_
function.
SELECT DOT_PRODUCT_I16(JSON_ARRAY_PACK_I16('[12,10]'),JSON_ARRAY_PACK_I16('[8,6]')) "Result";
+--------+
| Result |
+--------+
| 156 |
+--------+
Related Topics
Last modified: June 6, 2023