DOT_PRODUCT

The DOT_PRODUCT function returns the scalar product, or dot product, of two vectors. DOT_PRODUCT takes as input two vectors and returns a numeric value. 

A common use of DOT_PRODUCT is to calculate the similarity between vectors (vector similarity), which is used in semantic text search, generative AI, searches of images and audio files, and other applications. A typical query using DOT_PRODUCT is to find a set of vectors that most closely match a query vector.

If the input vectors to DOT_PRODUCT are normalized to length 1, then the result of DOT_PRODUCT is the cosine of the angle between the vectors. In this case, DOT_PRODUCT produces what is known as the cosine similarity metric.

See Working with Vector Data for more information about using vectors in SingleStore.

Syntax

DOT_PRODUCT(vector_expression, vector_expression)

Arguments

  • vector_expression: An expression that evaluates to a vector. Vectors can be stored in SingleStore using the BLOB type (BLOB Types).

Remarks

  • If the result is infinity, negative infinity, or not a number (NaN), NULL will be returned instead.

  • The default format for vector element storage and processing is a 32-bit floating-point number (F32). The DOT_PRODUCT function assumes the vector inputs are encoded as 32-bit floating-point numbers and returns a DOUBLE.

  • To execute this function, the host processor must support AVX2 instruction set extensions. If AVX2 is not supported, an error will occur during execution.

Using DOT_PRODUCT with Vectors as BLOBs

The following example shows the use of DOT_PRODUCT() to calculate the similarity between a query vector and a set of vectors in a table with the vectors stored as BLOBs.

Create a table with a column of type BLOB to store the vectors. The second column in this table, with column name vec and type BLOB, will store the vectors. This example demonstrates storing vector data using BLOBs, hence the column of type BLOB named vec.

Then insert data using the JSON_ARRAY_PACK() built-in function to easily insert properly formatted vectors.

CREATE TABLE vectors_b (id int, vec BLOB not null);
INSERT INTO vectors_b VALUES (1, JSON_ARRAY_PACK('[0.1, 0.8, 0.2, 0.555]')); 
INSERT INTO vectors_b VALUES (2, JSON_ARRAY_PACK('[0.45, 0.55, 0.495, 0.5]'));

Use the HEX() built-in function to return a printable form of the binary data:

SELECT HEX(vec) FROM vectors_b;
+----------------------------------+
| HEX(vec)                         |
+----------------------------------+
| 6666E63ECDCC0C3FA470FD3E0000003F |
| CDCCCC3DCDCC4C3FCDCC4C3E7B140E3F |
+----------------------------------+

Query the table using a DOT_PRODUCT() in a SELECT statement.

The SQL below sets up query vector (@query_vec) and then calculates the EUCLIDEAN_DISTANCE of the query vector and the vectors in the vectors_b table.

SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');
SELECT DOT_PRODUCT(vec, @query_vec) AS score
FROM vectors_b
ORDER BY score DESC;
+---------------------+
| score               |
+---------------------+
| 0.9810000061988831  |
| 0.8993000388145447  |
+---------------------+

DOT_PRODUCT() with JSON_ARRAY_PACK()

The JSON_ARRAY_PACK() function makes it easier to input properly-formatted vectors. JSON_ARRAY_PACK() should be used when loading vectors into tables as is shown in the example below. That is, vectors should be formatted with JSON_ARRAY_PACK() at the time they are loaded into a table so that the data stored in the BLOB attribute in the table is in packed binary format. SingleStore does not recommend storing vectors as JSON strings in tables, doing so will have a negative performance impact.

JSON_ARRAY_PACK() should not normally be used as an argument to the EUCLIDEAN_DISTANCE function except when JSON_ARRAY_PACK() is being used to build a constant vector value as is shown in the query below.

SELECT id, DOT_PRODUCT(
JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]'), vec) AS score
FROM vectors_b
ORDER BY score DESC;
+-----|---------------------+
| id  | score               |
+-----|---------------------+
| 2   | 0.9810000061988831  |
| 1   | 0.8993000388145447  |
+-----|---------------------+

Finite Precision of Floating-Point Arithmetic

When using JSON_ARRAY_PACK, with the default F32 representation or the F64 suffix, vector elements are stored as floating-point numbers. As such, some vector element inputs are not perfectly representable in floating-point which may cause some inexact results.

It is not advisable to directly compare floating-point values for equality.

Consider the example below which selects the DOT_PRODUCT of @query_vec with itself. The result under exact arithmetic would be 1.0, as the DOT_PRODUCT of a vector with itself is 1.0.

SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');
SELECT DOT_PRODUCT(@query_vec, @query_vec) AS DotProduct;
+---------------------+
| DotProduct          |
+---------------------+
| 1.0005160570144653  |
+---------------------+

The output is not 1.0 due to the finite precision in floating-point arithmetic. The input vector values are not perfectly representable in floating-point, thus the output is close to, but not exactly, 1.0.

Using DOT_PRODUCT in Filters, Joins, and Ordering

DOT_PRODUCT can appear wherever a floating-point expression can be used in a query.

Use the vectors_b table and the @query_vec created above when running the following queries.

The following query uses DOT_PRODUCT in the SELECT clause, names it "score," and then filters on the score in the WHERE clause. Note that in contrast to the previous query, the result of this query includes both the vector and the score, or DOT_PRODUCT, of that vector with respect to @query_vec.

SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');
SELECT DOT_PRODUCT(vec, @query_vec) AS score
FROM vectors_b
WHERE score > 0.7
ORDER BY score DESC;
+---------------------+
|   score             |
+---------------------+
| 0.9810000061988831  |
|  0.8993000388145447 |
+---------------------+

DOT_PRODUCT can even be used in cross products and joins – both arguments to it can be table fields or derived from table fields. The query below creates a second table of vectors and joins with our original table.

CREATE TABLE vectors_2b (id_2 int, vec_2 BLOB not null);
INSERT INTO vectors_2b VALUES (5, JSON_ARRAY_PACK('[0.4, 0.49, 0.16, 0.555]')); 
INSERT INTO vectors_2b VALUES (6, JSON_ARRAY_PACK('[-0.01, -0.1, -0.2, 0.975]'));
SELECT v1.id, v2.id_2, DOT_PRODUCT(v1.vec, v2.vec_2) AS score
FROM vectors_b v1, vectors_2b v2
WHERE DOT_PRODUCT(v1.vec, v2.vec_2) > 0.7
ORDER BY score DESC;
+------+------+--------------------+
| id   | id_2 | score              |
+------+------+--------------------+
|    2 |    5 | 0.8062000274658203 |
|    1 |    5 | 0.7720249891281128 |
+------+------+--------------------+

Formatting Binary Vector Data for BLOBs

When using the BLOB type for vector operations, vector data can be formatted using JSON_ARRAY_PACK. If your vector data is already in a packed binary format, you can load that data into the BLOBs. The data must be encoded as a BLOB containing packed numbers in little-endian byte order. Vectors stored as BLOBs can be of any length; however, the input blob length must be divisible by the size of the packed vector elements (1, 2, 4 , or 8 bytes, depending on the vector element).

Last modified: August 16, 2024

Was this article helpful?