DOT_ PRODUCT
On this page
The DOT_
function returns the scalar product, or dot product, of two vectors.DOT_
takes as input two vectors, and returns a numeric value.
A common use of DOT_
is to calculate the similarity between vectors (vector similarity), which is used in semantic text search, generative AI, searches of images and audio files, and other applications.DOT_
is to find a set of vectors that most closely match a query vector.
If the input vectors to DOT_
are normalized to length 1, then the result of DOT_
is the cosine of the angle between the vectors.DOT_
produces what is known as the cosine similarity metric.
See Working with Vector Data for more information about using vectors in SingleStore.
Syntax
DOT_PRODUCT(vector_expression, vector_expression)
Arguments
-
vector_
: An expression that evaluates to a vector.expression Vectors can be stored in SingleStore using the BLOB
data type (BLOB Types).
Remarks
-
If the result of
DOT_
is infinity, negative infinity, or not a number (NaN),PRODUCT NULL
will be returned instead. -
The default format for vector element storage and processing is 32-bit floating point number (
FLOAT
).The DOT_
function assumes the vector inputs are encoded as 32-bit floating point numbers and returns aPRODUCT DOUBLE
. -
See Using Other Numeric Element Types for information on using
DOT_
with vectors with element types other than 32-bit floating point numbers.PRODUCT
Using DOT_ PRODUCT with Vectors as BLOBs
The following example shows the use of DOT_
to calculate the similarity between a query vector and a set of vectors in a table with the vectors stored as BLOB
s.
Create a table with a column of type BLOB
to store the vectors.vec
and type BLOB
, will store the vectors.BLOB
s, hence the column of type BLOB
named vec
.
Then insert data using the JSON_
built-in function to easily insert properly formatted vectors.
CREATE TABLE vectors_b (id int, vec BLOB not null);INSERT INTO vectors_b VALUES (1, JSON_ARRAY_PACK('[0.1, 0.8, 0.2, 0.555]'));INSERT INTO vectors_b VALUES (2, JSON_ARRAY_PACK('[0.45, 0.55, 0.495, 0.5]'));
To demonstrate the contents of the table, use the JSON_
function to return the table elements in JSON format:
SELECT JSON_ARRAY_UNPACK(vec) FROM vectors_b;
+---------------------------------------------------+
| JSON_ARRAY_UNPACK(vec) |
+---------------------------------------------------+
| [0.449999988,0.550000012,0.495000005,0.5] |
| [0.100000001,0.800000012,0.200000003,0.555000007] |
+---------------------------------------------------+
You can also use the HEX()
built-in function to return a printable form of the binary data:
SELECT HEX(vec) FROM vectors_b;
+----------------------------------+
| HEX(vec) |
+----------------------------------+
| 6666E63ECDCC0C3FA470FD3E0000003F |
| CDCCCC3DCDCC4C3FCDCC4C3E7B140E3F |
+----------------------------------+
Query the table using a DOT_
in a SELECT
statement.
The SQL below sets up query vector (@query_
) and then calculates the EUCLIDEAN_
of the query vector and vectors in the vectors_
table.
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');SELECT DOT_PRODUCT(vec, @query_vec) AS scoreFROM vectors_bORDER BY score DESC;
+---------------------+
| score |
+---------------------+
| 0.9810000061988831 |
| 0.8993000388145447 |
+---------------------+
DOT_
The JSON_
function makes it easier to input properly-formatted vectors.JSON_
should be used when loading vectors into tables as is shown in the example below.JSON_
at the time they are loaded into a table so that the data stored in the BLOB
attribute in the table is in packed binary format.
JSON_
should not normally be used as an argument to the DOT_
function except when JSON_
is being used to build a constant vector value as is shown in the query below.
SELECT id, DOT_PRODUCT(JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]'), vec) AS scoreFROM vectors_bORDER BY score DESC;
+-----|---------------------+
| id | score |
+-----|---------------------+
| 2 | 0.9810000061988831 |
| 1 | 0.8993000388145447 |
+-----|---------------------+
Finite Precision of Floating Point Arithmetic
When using JSON_
, vector elements are stored as floating point numbers.
It is not advisable to directly compare floating point values for equality.
Consider the example below which selects the DOT_
of @query_
with itself.DOT_
of a vector with itself is 1.
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');SELECT DOT_PRODUCT(@query_vec, @query_vec) AS DotProduct;
+---------------------+
| DotProduct |
+---------------------+
| 1.0005160570144653 |
+---------------------+
The output is not 1.
Using DOT_ PRODUCT in Filters, Joins, and Ordering
DOT_
can appear wherever a floating point expression can be used in a query.
Use the vectors_
table and the @query_
created above when running the following queries.
The following query uses DOT_
in the SELECT
clause, names it "score," and then filters on the score in the WHERE
clause.DOT_
, of that vector with respect to @query_
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');SELECT DOT_PRODUCT(vec, @query_vec) AS scoreFROM vectors_bWHERE score > 0.7ORDER BY score DESC;
+---------------------+
| score |
+---------------------+
| 0.9810000061988831 |
| 0.8993000388145447 |
+---------------------+
DOT_
can even be used in cross products and joins – both arguments to it can be table fields or derived from table fields.
CREATE TABLE vectors_2b (id_2 int, vec_2 BLOB not null);INSERT INTO vectors_2b VALUES (5, JSON_ARRAY_PACK('[0.4, 0.49, 0.16, 0.555]'));INSERT INTO vectors_2b VALUES (6, JSON_ARRAY_PACK('[-0.01, -0.1, -0.2, 0.975]'));
SELECT v1.id, v2.id_2, DOT_PRODUCT(v1.vec, v2.vec_2) AS scoreFROM vectors_b v1, vectors_2b v2WHERE DOT_PRODUCT(v1.vec, v2.vec_2) > 0.7ORDER BY score DESC;
+------+------+--------------------+
| id | id_2 | score |
+------+------+--------------------+
| 2 | 5 | 0.8062000274658203 |
| 1 | 5 | 0.7720249891281128 |
+------+------+--------------------+
Using Other Numeric Element Types
The default format for vector element storage and processing is 32-bit floating point number (FLOAT
).
If you wish to use a data type other than 32-bit floating point, suffixes can be added to the DOT_
and JSON_
functions to specify other types for the vector elements._
.
Below is an example of using JSON_
and DOT_
with 16-bit signed integers.
CREATE TABLE vectors_b_i (id int, vec BLOB not null);INSERT INTO vectors_b_i VALUES (1, JSON_ARRAY_PACK_I16('[1, 3, 2, 5]'));INSERT INTO vectors_b_i VALUES(2, JSON_ARRAY_PACK_I16('[23, 4, 1, 8]'));
SET @query_vec = JSON_ARRAY_PACK_I16('[4, 5, 4, 2]');SELECTDOT_PRODUCT_I16(@query_vec, vec) as DotProductFROM vectors_b_iORDER BY DotProduct DESC;
+------|-------------+
| id | DotProduct |
+------|-------------+
| 2 | 132 |
| 1 | 37 |
+------|-------------+
The result is an integer as indicated by the _
suffix.
When using suffixed versions of DOT_
, the return type will be the type of the suffix.
Note
It is important that you use the DOT_
function that matches the encoding of the input vectors and that the encodings of the input vectors match.DOT_
should be used with vectors that have been encoded with JSON_
, and both input vectors must be encoded as I16
.
The table below lists the suffixes and associated data types.
Suffix |
Data Type |
---|---|
|
8-bit signed integer |
|
16-bit signed integer |
|
32-bit signed integer |
|
64-bit signed integer |
|
32-bit floating-point number (IEEE standard format) |
|
64-bit floating-point number (IEEE standard format) |
Formatting Binary Vector Data for BLOBs
When using the BLOB
data type for DOT_
operations, the vector data can be formatted using JSON_
.BLOB
s.BLOB
containing packed numbers in little-endian byte order.BLOB
s can be of any length; however, the input blob length must be divisible by the size of the packed vector elements (1, 2, 4 or 8 bytes, depending on the vector element).
Related Topics
Last modified: March 13, 2024