DOT_ PRODUCT
On this page
The DOT_
function returns the scalar product, or dot product, of two vectors.DOT_
takes as input two vectors and returns a numeric value.
A common use of DOT_
is to calculate the similarity between vectors (vector similarity), which is used in semantic text search, generative AI, searches of images and audio files, and other applications.DOT_
is to find a set of vectors that most closely match a query vector.
If the input vectors to DOT_
are normalized to length 1, then the result of DOT_
is the cosine of the angle between the vectors.DOT_
produces what is known as the cosine similarity metric.
See Working with Vector Data for more information about using vectors in SingleStore.
Syntax
DOT_PRODUCT(vector_expression, vector_expression)
Arguments
-
vector_
: An expression that evaluates to a vector.expression Vectors can be stored in SingleStore using the BLOB
type (BLOB Types).
Remarks
-
If the result is infinity, negative infinity, or not a number (NaN),
NULL
will be returned instead. -
The default format for vector element storage and processing is a 32-bit floating-point number (
F32
).The DOT_
function assumes the vector inputs are encoded as 32-bit floating-point numbers and returns aPRODUCT DOUBLE
. -
To execute this function, the host processor must support AVX2 instruction set extensions.
If AVX2 is not supported, an error will occur during execution.
Using DOT_ PRODUCT with Vectors as BLOBs
The following example shows the use of DOT_
to calculate the similarity between a query vector and a set of vectors in a table with the vectors stored as BLOB
s.
Create a table with a column of type BLOB
to store the vectors.vec
and type BLOB
, will store the vectors.BLOB
s, hence the column of type BLOB
named vec
.
Then insert data using the JSON_
built-in function to easily insert properly formatted vectors.
CREATE TABLE vectors_b (id int, vec BLOB not null);INSERT INTO vectors_b VALUES (1, JSON_ARRAY_PACK('[0.1, 0.8, 0.2, 0.555]'));INSERT INTO vectors_b VALUES (2, JSON_ARRAY_PACK('[0.45, 0.55, 0.495, 0.5]'));
Use the HEX()
built-in function to return a printable form of the binary data:
SELECT HEX(vec) FROM vectors_b;
+----------------------------------+
| HEX(vec) |
+----------------------------------+
| 6666E63ECDCC0C3FA470FD3E0000003F |
| CDCCCC3DCDCC4C3FCDCC4C3E7B140E3F |
+----------------------------------+
Query the table using a DOT_
in a SELECT
statement.
The SQL below sets up query vector (@query_
) and then calculates the EUCLIDEAN_
of the query vector and the vectors in the vectors_
table.
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');SELECT DOT_PRODUCT(vec, @query_vec) AS scoreFROM vectors_bORDER BY score DESC;
+---------------------+
| score |
+---------------------+
| 0.9810000061988831 |
| 0.8993000388145447 |
+---------------------+
DOT_
The JSON_
function makes it easier to input properly-formatted vectors.JSON_
should be used when loading vectors into tables as is shown in the example below.JSON_
at the time they are loaded into a table so that the data stored in the BLOB
attribute in the table is in packed binary format.
JSON_
should not normally be used as an argument to the EUCLIDEAN_
function except when JSON_
is being used to build a constant vector value as is shown in the query below.
SELECT id, DOT_PRODUCT(JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]'), vec) AS scoreFROM vectors_bORDER BY score DESC;
+-----|---------------------+
| id | score |
+-----|---------------------+
| 2 | 0.9810000061988831 |
| 1 | 0.8993000388145447 |
+-----|---------------------+
Finite Precision of Floating-Point Arithmetic
When using JSON_
, with the default F32
representation or the F64
suffix, vector elements are stored as floating-point numbers.
It is not advisable to directly compare floating-point values for equality.
Consider the example below which selects the DOT_
of @query_
with itself.DOT_
of a vector with itself is 1.
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');SELECT DOT_PRODUCT(@query_vec, @query_vec) AS DotProduct;
+---------------------+
| DotProduct |
+---------------------+
| 1.0005160570144653 |
+---------------------+
The output is not 1.
Using DOT_ PRODUCT in Filters, Joins, and Ordering
DOT_
can appear wherever a floating-point expression can be used in a query.
Use the vectors_
table and the @query_
created above when running the following queries.
The following query uses DOT_
in the SELECT
clause, names it "score," and then filters on the score in the WHERE
clause.DOT_
, of that vector with respect to @query_
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');SELECT DOT_PRODUCT(vec, @query_vec) AS scoreFROM vectors_bWHERE score > 0.7ORDER BY score DESC;
+---------------------+
| score |
+---------------------+
| 0.9810000061988831 |
| 0.8993000388145447 |
+---------------------+
DOT_
can even be used in cross products and joins – both arguments to it can be table fields or derived from table fields.
CREATE TABLE vectors_2b (id_2 int, vec_2 BLOB not null);INSERT INTO vectors_2b VALUES (5, JSON_ARRAY_PACK('[0.4, 0.49, 0.16, 0.555]'));INSERT INTO vectors_2b VALUES (6, JSON_ARRAY_PACK('[-0.01, -0.1, -0.2, 0.975]'));
SELECT v1.id, v2.id_2, DOT_PRODUCT(v1.vec, v2.vec_2) AS scoreFROM vectors_b v1, vectors_2b v2WHERE DOT_PRODUCT(v1.vec, v2.vec_2) > 0.7ORDER BY score DESC;
+------+------+--------------------+
| id | id_2 | score |
+------+------+--------------------+
| 2 | 5 | 0.8062000274658203 |
| 1 | 5 | 0.7720249891281128 |
+------+------+--------------------+
Formatting Binary Vector Data for BLOBs
When using the BLOB
type for vector operations, vector data can be formatted using JSON_
.BLOB
s.BLOB
containing packed numbers in little-endian byte order.BLOB
s can be of any length; however, the input blob length must be divisible by the size of the packed vector elements (1, 2, 4 , or 8 bytes, depending on the vector element).
Related Topics
Last modified: August 16, 2024