DOT_PRODUCT
Returns the scalar product, or dot product, of two vector input values.
It is a scalar function.
Syntax
DOT_PRODUCT(vector_expression, vector_expression)
Arguments
vector_expression
: An expression that evaluates to a vector. The vector must be encoded as a blob containing packed single-precision or double-precision floating-point numbers in little-endian byte order.
Remarks
DOT_PRODUCT()
can be used to compute a cosine similarity metric of the two input vectors, if the input vectors are normalized to length 1.
A vector can be of any length, but the input blob length must be divisible by the packed vector element size (1, 2, 4 or 8 bytes, depending on the vector element).
If the result of DOT_PRODUCT()
is infinity, negative infinity, or not a number (NaN), NULL
will be returned instead.
To compute the dot product against multiple vector inputs, use UDFs or cross-joins.
You can specify the datatype of the vector elements in which this operation is performed on the vector by adding a suffix to the function. Omitting the suffix from the function is equivalent to suffixing it with _F32
. All operations are done using the specified datatype. The following table lists the suffixes and their datatype.
Suffix | Datatype |
---|---|
| 8-bit signed integer |
| 16-bit signed integer |
| 32-bit signed integer |
| 64-bit signed integer |
| 32-bit floating-point number (IEEE standard format) |
| 64-bit floating-point number (IEEE standard format) |
Examples
Example: SELECT Using DOT_PRODUCT() on Existing Rows
The following example executes DOT_PRODUCT()
on a row containing two vectors. The HEX()
built-in function is also used to return a readable form of the binary output.
Create a table with two BLOB
-typed columns:
CREATE TABLE dp_t (a BLOB, b BLOB);
Using the JSON_ARRAY_PACK() built-in function to easily insert properly formatted vectors, insert a row with each vector in a different column:
INSERT INTO dp_t VALUES (JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));
To demonstrate the contents of the table, use the HEX()
built-in function to return a readable form of the binary data:
SELECT HEX(dp_t.a), HEX(dp_t.b) FROM dp_t; **** +--------------------------+--------------------------+ | HEX(dp_t.a) | HEX(dp_t.b) | +--------------------------+--------------------------+ | 3333333FCDCC4C3E9A99D93F | 0000803F0000003F00000040 | +--------------------------+--------------------------+
You can also use the JSON_ARRAY_UNPACK()
function to return the table elements in JSON format:
SELECT JSON_ARRAY_UNPACK(dp_t.a), JSON_ARRAY_UNPACK(dp_t.b) FROM dp_t; **** +--------------------------------------+---------------------------+ | JSON_ARRAY_UNPACK(dp_t.a) | JSON_ARRAY_UNPACK(dp_t.b) | +--------------------------------------+---------------------------+ | [0.699999988,0.200000003,1.70000005] | [1,0.5,2] | +--------------------------------------+---------------------------+
Query the table using the DOT_PRODUCT()
function in a SELECT
statement:
SELECT DOT_PRODUCT(dp_t.a, dp_t.b) FROM dp_t; **** +-----------------------------+ | DOT_PRODUCT(dp_t.a, dp_t.b) | +-----------------------------+ | 4.200000084936619 | +-----------------------------+
Example: DOT_PRODUCT() with JSON_ARRAY_PACK()
The JSON_ARRAY_PACK()
built-in function makes it easier to add properly-formatted vectors as input parameters to the DOT_PRODUCT()
function. Consider the example below:
SELECT DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]')); **** +-------------------------------------------------------------------------------------+ | DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]')) | +-------------------------------------------------------------------------------------+ | 5.25 | +-------------------------------------------------------------------------------------+
Example: Floating Point Deviation in DOT_PRODUCT() Function
The DOT_PRODUCT
function accepts FLOAT
data type as an input and returns a DOUBLE
. The output does not have exact DOUBLE
precision because of the FLOAT
input.
Note: It is advisable not to directly compare floating point values for equality.
For example, two vectors are provided as input parameters to the DOT_PRODUCT()
function.
INSERT INTO dp_t1 VALUES(JSON_ARRAY_PACK('[0.6,0.8]'), JSON_ARRAY_PACK('[0.6,0.8]')); SELECT DOT_PRODUCT(a, b) FROM dp_t1; **** +---------------------+ | DOT_PRODUCT(a, b) | +---------------------+ | 1.0000000596046448 | +---------------------+
In this example, the output is not 1.0
as would be expected. The output 1.0
would be possible only if the input vector values are perfectly representable in floating point.
Example: Computing A Large Number of Dot Products Efficiently, When there are Repeating Inputs
Suppose x
and y
are lists of vectors, where x
contains a large number of items. For each vector in x
, you want to calculate the dot product for each vector in y
, where the vector in x
is the first dot product input and the vector in y
is the second dot product input. To compute this efficiently, you can separate x
and y
into two tables perform a cross join. For example,
SELECT x.vector, DOT_PRODUCT(x.vector, y.vector) FROM x,y;
When DOT_PRODUCT(x.vector, y.vector)
is calculated using a vector in x.vector
for each vector in y.vector
, the vector in x.vector
is loaded from memory to CPU only once.
Example: Typecasting the Datatype Using Suffixes
To perform this operation on vectors specified in a format other than the default 32-bit floating point number, specify a suffix with the DOT_PRODUCT()
function. For example,
SELECT DOT_PRODUCT_I16(JSON_ARRAY_PACK_I16('[12,10]'),JSON_ARRAY_PACK_I16('[8,6]')) "Result"; **** +--------+ | Result | +--------+ | 156 | +--------+