DOT_ PRODUCT
On this page
The DOT_
function returns the scalar product, or dot product, of two vectors.DOT_
takes as input two vectors, and returns a numeric value.
A common use of DOT_
is to calculate the similarity between vectors (vector similarity), which is used in semantic text search, generative AI, searches of images and audio files, and other applications.DOT_
is to find a set of vectors that most closely match a query vector.
If the input vectors to DOT_
are normalized to length 1, then the result of DOT_
is the cosine of the angle between the vectors.DOT_
produces what is known as the cosine similarity metric.
SingleStore supports a native vector data type and indexed approximate-nearest-neighbor (ANN) search that provide high-performance vector search and easier building of vector-based applications.
See Vector Type, Vector Indexing, and Working with Vector Data for more information about using vectors in SingleStore.
Syntax
vector_expression <*> vector_expressionDOT_PRODUCT(vector_expression, vector_expression)
Arguments
-
vector_
: An expression that evaluates to a vector.expression Vectors can be stored in SingleStore using the native VECTOR
data type (Vector Type) or theBLOB
data type (BLOB Types).SingleStore recommends using the VECTOR
data type when possible.
Remarks
-
If the result of
DOT_
is infinity, negative infinity, or not a number (NaN),PRODUCT NULL
will be returned instead. -
The default format for vector element storage and processing is 32-bit floating point number (
FLOAT
).The DOT_
function assumes the vector inputs are encoded as 32-bit floating point numbers and returns aPRODUCT DOUBLE
. -
See Using Other Numeric Element Types for information on using
DOT_
with vectors with element types other than 32-bit floating point numbers.PRODUCT
Using DOT_ PRODUCT with the VECTOR Data Type
The following example shows the use of DOT_
to calculate the similarity between a query vector and a set of vectors in a table.
Create a table with a column of type VECTOR
, insert data into the table, and then verify the contents of the table.
CREATE TABLE vectors (id int, vec VECTOR(4) not null);INSERT INTO vectors VALUES (1, '[0.45, 0.55, 0.495, 0.5]');INSERT INTO vectors VALUES (2, '[0.1, 0.8, 0.2, 0.555]');INSERT INTO vectors VALUES (3, '[-0.5, -0.03, -0.1, 0.86]');INSERT INTO vectors VALUES (4, '[0.5, 0.3, 0.807, 0.1]');
SELECT id, vecFROM vectorsORDER BY id;
+------+---------------------------------------------------+
| id | vec |
+------+---------------------------------------------------+
| 1 | [0.449999988,0.550000012,0.495000005,0.5] |
| 2 | [0.100000001,0.800000012,0.200000003,0.555000007] |
| 3 | [-0.5,-0.0299999993,-0.100000001,0.860000014] |
| 4 | [0.5,0.300000012,0.806999981,0.100000001] |
+------+---------------------------------------------------+
You will notice that the results of the SELECT
do not exactly match the values in the INSERT
.VECTOR
data type are stored as floating point numbers and the values in the INSERT
statement are not perfectly representable in floating point.DOT_
.
After you've created a table and inserted data, set up a query vector.vectors
table and @query_
.
The SQL below finds the similarities between vectors and @query_
using using the DOT_
infix operator <*>
.ORDER BY
clause is included so the order of your results match the results shown below.
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);SELECT vec, vec <*> @query_vec AS scoreFROM vectorsORDER BY score DESC;
+---------------------------------------------------|---------------------+
| vec | score |
+---------------------------------------------------|---------------------+
| [0.449999988,0.550000012,0.495000005,0.5] | 0.9810000061988831 |
| [0.100000001,0.800000012,0.200000003,0.555000007] | 0.8993000388145447 |
| [0.5,0.300000012,0.806999981,0.100000001] | 0.7225800156593323 |
| [-0.5,-0.0299999993,-0.100000001,0.860000014] | 0.2625800371170044 |
+---------------------------------------------------|---------------------+
Using DOT_ PRODUCT (<*>) in Filters, Joins, and Ordering
DOT_
(<*>
)can appear wherever a floating point expression can be used in a query.
Use the vectors
table and the @query_
created above when running the following queries.
This query uses DOT_
(<*>
) as a filter in a WHERE
clause.
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);SELECT vecFROM vectorsWHERE vec <*> @query_vec > 0.7;
+---------------------------------------------------+
| vec |
+---------------------------------------------------+
| [0.449999988,0.550000012,0.495000005,0.5] |
| [0.5,0.300000012,0.806999981,0.100000001] |
| [0.100000001,0.800000012,0.200000003,0.555000007] |
+---------------------------------------------------+
The following query uses DOT_
(<*>
) in the SELECT
clause, names it "score," and then filters on the score in the WHERE
clause.DOT_
, of that vector with respect to @query_
.
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);SELECT vec, vec <*> @query_vec AS scoreFROM vectorsWHERE score > 0.7ORDER BY score DESC;
+---------------------------------------------------|---------------------+
| vec | score |
+---------------------------------------------------|---------------------+
| [0.449999988,0.550000012,0.495000005,0.5] | 0.9810000061988831 |
| [0.100000001,0.800000012,0.200000003,0.555000007] | 0.8993000388145447 |
| [0.5,0.300000012,0.806999981,0.100000001] | 0.7225800156593323 |
+---------------------------------------------------|---------------------+
You can find the vectors in the vectors
table that are the most similar to @query_
using ORDER BY … LIMIT
queries, as is shown in the query below.
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);SELECT vec, vec <*> @query_vec AS scoreFROM vectorsORDER BY score DESCLIMIT 2;
+---------------------------------------------------|---------------------+
| vec | score |
+---------------------------------------------------|---------------------+
| [0.449999988,0.550000012,0.495000005,0.5] | 0.9810000061988831 |
| [0.100000001,0.800000012,0.200000003,0.555000007] | 0.8993000388145447 |
+---------------------------------------------------|---------------------+
Finally, DOT_
(<*>
) can even be used in cross products and joins – both arguments to it can be table fields or derived from table fields.
CREATE TABLE vectors_2 (id_2 int, vec_2 VECTOR(4) not null);INSERT INTO vectors_2 VALUES (5, '[0.4, 0.49, 0.16, 0.555]');INSERT INTO vectors_2 VALUES (6, '[-0.01, -0.1, -0.2, 0.975]');
SELECT v1.id, v2.id_2, v1.vec <*> v2.vec_2 AS scoreFROM vectors v1, vectors_2 v2WHERE v1.vec <*> v2.vec_2 > 0.7ORDER BY score DESC;
+------+------+--------------------+
| id | id_2 | score |
+------+------+--------------------+
| 3 | 6 | 0.8665000200271606 |
| 1 | 5 | 0.8062000274658203 |
| 2 | 5 | 0.7720249891281128 |
+------+------+--------------------+
Finite Precision of Floating Point Arithmetic
Elements in the VECTOR
data type are stored as floating point numbers.
Note
It is not advisable to directly compare floating point values for equality.
Consider the example below which selects the DOT_
of @query_
with itself.DOT_
of a vector with itself is 1.
SET @query_vec = '[0.44, 0.554, 0.34, 0.62]':>VECTOR(4);SELECT @query_vec <*> @query_vec AS DotProduct;
+---------------------+
| DotProduct |
+---------------------+
| 1.0005160570144653 |
+---------------------+
The output is not 1.
Using DOT_ PRODUCT with Vectors as BLOBs
The following example shows the use of DOT_
to calculate the similarity between a query vector and a set of vectors in a table with the vectors stored as BLOB
s.
Create a table with a column of type BLOB
to store the vectors.vec
and type BLOB
, will store the vectors.BLOB
s, hence the column of type BLOB
named vec
.
Then insert data using the JSON_
built-in function to easily insert properly formatted vectors.
CREATE TABLE vectors_b (id int, vec BLOB not null);INSERT INTO vectors_b VALUES (1, JSON_ARRAY_PACK('[0.1, 0.8, 0.2, 0.555]'));INSERT INTO vectors_b VALUES (2, JSON_ARRAY_PACK('[0.45, 0.55, 0.495, 0.5]'));
To demonstrate the contents of the table, use the JSON_
function to return the table elements in JSON format:
SELECT JSON_ARRAY_UNPACK(vec) FROM vectors_b;
+---------------------------------------------------+
| JSON_ARRAY_UNPACK(vec) |
+---------------------------------------------------+
| [0.449999988,0.550000012,0.495000005,0.5] |
| [0.100000001,0.800000012,0.200000003,0.555000007] |
+---------------------------------------------------+
You can also use the HEX()
built-in function to return a printable form of the binary data:
SELECT HEX(vec) FROM vectors_b;
+----------------------------------+
| HEX(vec) |
+----------------------------------+
| 6666E63ECDCC0C3FA470FD3E0000003F |
| CDCCCC3DCDCC4C3FCDCC4C3E7B140E3F |
+----------------------------------+
Query the table using a DOT_
in a SELECT
statement.
The SQL below sets up query vector (@query_
) and then calculates the EUCLIDEAN_
of the query vector and vectors in the vectors_
table.
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');SELECT DOT_PRODUCT(vec, @query_vec) AS scoreFROM vectors_bORDER BY score DESC;
+---------------------+
| score |
+---------------------+
| 0.9810000061988831 |
| 0.8993000388145447 |
+---------------------+
DOT_
The JSON_
function makes it easier to input properly-formatted vectors.JSON_
should be used when loading vectors into tables as is shown in the example below.JSON_
at the time they are loaded into a table so that the data stored in the BLOB
attribute in the table is in packed binary format.
JSON_
should not normally be used as an argument to the DOT_
function except when JSON_
is being used to build a constant vector value as is shown in the query below.
SELECT id, DOT_PRODUCT(JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]'), vec) AS scoreFROM vectors_bORDER BY score DESC;
+-----|---------------------+
| id | score |
+-----|---------------------+
| 2 | 0.9810000061988831 |
| 1 | 0.8993000388145447 |
+-----|---------------------+
Note
It is not recommended to use the DOT_
infix operator (<*>
) with vectors stored as BLOB
s.BLOB
data, the infix operator (<*>
) will interpret vector elements as 32-bit floating point numbers.
Using Other Numeric Element Types
The default format for vector element storage and processing is 32-bit floating point number (FLOAT
).
The infix operator <*>
requires that both vector operands have the same number of elements and the same element type.
If you wish to use a data type other than 32-bit floating point, suffixes can be added to the DOT_
and JSON_
functions to specify other types for the vector elements._
.
Below is an example of using JSON_
and DOT_
with 16-bit signed integers.
CREATE TABLE vectors_b_i (id int, vec BLOB not null);INSERT INTO vectors_b_i VALUES (1, JSON_ARRAY_PACK_I16('[1, 3, 2, 5]'));INSERT INTO vectors_b_i VALUES(2, JSON_ARRAY_PACK_I16('[23, 4, 1, 8]'));
SET @query_vec = JSON_ARRAY_PACK_I16('[4, 5, 4, 2]');SELECTDOT_PRODUCT_I16(@query_vec, vec) as DotProductFROM vectors_b_iORDER BY DotProduct DESC;
+------|-------------+
| id | DotProduct |
+------|-------------+
| 2 | 132 |
| 1 | 37 |
+------|-------------+
The result is an integer as indicated by the _
suffix.
When using suffixed versions of DOT_
, the return type will be the type of the suffix.
Note
It is important that you use the DOT_
function that matches the encoding of the input vectors and that the encodings of the input vectors match.DOT_
should be used with vectors that have been encoded with JSON_
, and both input vectors must be encoded as I16
.
The table below lists the suffixes and associated data types.
Suffix |
Data Type |
---|---|
|
8-bit signed integer |
|
16-bit signed integer |
|
32-bit signed integer |
|
64-bit signed integer |
|
32-bit floating-point number (IEEE standard format) |
|
64-bit floating-point number (IEEE standard format) |
Formatting Binary Vector Data for BLOBs
When using the BLOB
data type for DOT_
operations, the vector data can be formatted using JSON_
.BLOB
s.BLOB
containing packed numbers in little-endian byte order.BLOB
s can be of any length; however, the input blob length must be divisible by the size of the packed vector elements (1, 2, 4 or 8 bytes, depending on the vector element).
Related Topics
Last modified: March 13, 2024