EUCLIDEAN_ DISTANCE
On this page
The EUCLIDEAN_
function returns the euclidean distance between two vector values.EUCLIDEAN_
takes as input two vectors and returns a numeric value.
A common use of EUCLIDEAN_
is to calculate the similarity between vectors (vector similarity), which is used in semantic text search, generative AI, searches of images and audio files, and other applications.EUCLIDEAN_
is to find a set of vectors that most closely match a query vector.
SingleStore supports a native vector data type and indexed approximate-nearest-neighbor (ANN) search that provides high-performance vector search and easier building of vector-based applications.
See Vector Type, Vector Indexing, and Working with Vector Data for more information about using vectors in SingleStore.
Syntax
vector_expression <-> vector_expression
EUCLIDEAN_DISTANCE(vector_expression, vector_expression)
Arguments
-
vector_
: An expression that evaluates to a vector.expression Vectors can be stored in SingleStore using the native VECTOR
type (Vector Type) or theBLOB
type (BLOB Types).SingleStore recommends using the VECTOR
type when possible. -
JSON strings are allowed as
vector_
s when the other argument is of typeexpression VECTOR
.
Remarks
-
If both arguments are of type
VECTOR
, those arguments must have the same element types and the same number of elements. -
If one argument is a
VECTOR
, the other argument (which may be a JSON string or aBLOB
) will be converted to the type of theVECTOR
argument.-
It will cause an error if the JSON string has a different number of elements than the
VECTOR
argument. -
It will cause an error if the length of the
BLOB
is such that theBLOB
cannot be converted to the type of theVECTOR
.Note that there is no type checking in this conversion, so ensure that the BLOB
s were encoded with the same type as theVECTOR
argument.
-
-
If both arguments are
BLOB
s, both arguments will be treated as vectors with 32-bit floating-point numbers.It will cause an error if the arguments are different lengths.
-
If the result is infinity, negative infinity, or not a number (NaN),
NULL
will be returned instead. -
See Using Suffixes for Other Element Types with BLOBs for information on using
EUCLIDEAN_
with vectors with element types other than 32-bit floating-point numbers.DISTANCE -
The default element type for vector storage and processing is 32-bit floating-point (
F32
).The EUCLIDEAN_
function assumes the vector inputs are encoded as 32-bit floating-point numbers and returns aDISTANCE DOUBLE
. -
EUCLIDEAN_
is computationally equivalent toDISTANCE(v1, v2) SQRT(DOT_
.PRODUCT(VECTOR_ SUB(v1, v2), VECTOR_ SUB(v1, v2))) However, the EUCLIDEAN_
function is more efficient than the latter.DISTANCE()
Using EUCLIDEAN_ DISTANCE with the VECTOR Data Type
The following example shows the use of EUCLIDEAN_
to calculate the similarity between a query vector and a set of vectors in a table.
Create a table with a column of type VECTOR
, insert data into the table, and then verify the contents of the table.
CREATE TABLE vectors (id int, vec VECTOR(4) not null);INSERT INTO vectors VALUES (1, '[0.45, 0.55, 0.495, 0.5]');INSERT INTO vectors VALUES (2, '[0.1, 0.8, 0.2, 0.555]');INSERT INTO vectors VALUES (3, '[-0.5, -0.03, -0.1, 0.86]');INSERT INTO vectors VALUES (4, '[0.5, 0.3, 0.807, 0.1]');
SELECT id, vecFROM vectorsORDER BY id;
+------+---------------------------------------------------+
| id | vec |
+------+---------------------------------------------------+
| 1 | [0.449999988,0.550000012,0.495000005,0.5] |
| 2 | [0.100000001,0.800000012,0.200000003,0.555000007] |
| 3 | [-0.5,-0.0299999993,-0.100000001,0.860000014] |
| 4 | [0.5,0.300000012,0.806999981,0.100000001] |
+------+---------------------------------------------------+
Notice that the results of the SELECT
do not exactly match the values in the INSERT
.VECTOR
data type are stored as floating-point numbers and the values in the INSERT
statement are not perfectly representable in floating-point.
After you've created a table and inserted data, set up a query vector.vectors
table and @query_
.
This query finds the similarities between vectors and @query_
using the EUCLIDEAN_
infix operator <->
.ORDER BY
clause is included so the order of your results match the results shown below.
Lower values of EUCLIDEAN_
indicate higher similarity.EUCLIDEAN_
is always greater than 0
.
The @query_
variable is cast to a VECTOR
to ensure that @query_
is a valid VECTOR
and to improve performance.
SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);SELECT vec <-> @query_vec AS scoreFROM vectorsORDER BY score ASC;
+---------------------+
| score |
+---------------------+
| 0.19631861943383927 |
| 0.44714763700937277 |
| 0.7460596366393402 |
| 1.2148481657187131 |
+---------------------+
Using EUCLIDEAN_ DISTANCE (<->) in Filters, Joins, and Ordering
EUCLIDEAN_
can appear wherever a floating-point expression can be used in a query, including in filters, ordering, joins, and cross products.
The DOT_DOT_
in various ways in SQL queries.EUCLIDEAN_
can be used similarly.ASC
when using EUCLIDEAN_
and DESC
with DOT_
.
Below is one example of using EUCLIDEAN_
(<->
) in a join query.
Use the vectors
table and the @query_
created above, in addition to the new table created below, when running the following query.
CREATE TABLE vectors_2 (id_2 int, vec_2 VECTOR(4) not null);INSERT INTO vectors_2 VALUES (5, '[0.4, 0.49, 0.16, 0.555]');INSERT INTO vectors_2 VALUES (6, '[-0.01, -0.1, -0.2, 0.975]');
Click the Playground icon to the right of the SQL listing to try this query.
SELECT v1.id, v2.id_2, v1.vec <-> v2.vec_2 AS scoreFROM vectors v1, vectors_2 v2WHERE v1.vec <-> v2.vec_2 < 0.7ORDER BY score ASC;
+------+------+---------------------+
| id | id_2 | score |
+------+------+---------------------+
| 1 | 5 | 0.34835327652980475 |
| 2 | 5 | 0.4332435843558954 |
| 3 | 6 | 0.5179044162849211 |
+------+------+---------------------+
Automatic Type Conversions
As described in Remarks, in some cases, SingleStore will do automatic type conversions between JSON strings, BLOB
s, and VECTOR
s.
Note
The examples below use the dot product infix operator <->
; all the functionality shown also works with the EUCLIDEAN_
function.
Example 1 - VECTOR(4, F32) <-> JSON
Use the vectors
table created above.VECTOR
that holds vectors of length 4 with element type of 32-bit floating-point (F32
).
The following SQL searches for vectors that are similar to the vector '[0.
.<->
is a vector of length 4 with element-type 32-bit floating-point number and will convert the JSON string (the right-hand argument) to that vector type.
SELECT vec, vec <-> '[0.44, 0.554, 0.34, 0.62]' AS scoreFROM vectorsORDER BY score ASC;
+---------------------------------------------------+---------------------+
| vec | score |
+---------------------------------------------------+---------------------+
| [0.449999988,0.550000012,0.495000005,0.5] | 0.19631861943383927 |
| [0.100000001,0.800000012,0.200000003,0.555000007] | 0.44714763700937277 |
| [0.5,0.300000012,0.806999981,0.100000001] | 0.7460596366393402 |
| [-0.5,-0.0299999993,-0.100000001,0.860000014] | 1.2148481657187131 |
+---------------------------------------------------+---------------------+
Note
In the above example, the JSON string is directly included in the SELECT
clause as a constant.SELECT
clause is fine for queries that are run once.VECTOR
and use that VECTOR
in the query as shown in Example 3 below.
Example 2: Vector(3,I16) <-> JSON STRING
Create a table of vectors of length 3 and element type 16-bit integer (I16
) and insert data into that table.
CREATE TABLE vectors_i16(id INT, vec VECTOR(3, I16));INSERT INTO vectors_i16 VALUES(1, '[1, 2, 3]');INSERT INTO vectors_i16 VALUES(2, '[4, 5, 6]');INSERT INTO vectors_i16 VALUES(3, '[1, 4, 8]');
The following SQL calculates the dot product between the @query_
and the vectors in the vectors_
table.<->
in the SELECT
clause is a vector of length 3 with element-type 16-bit integer and will convert the JSON string (the right-hand argument) to that vector type.
SELECT id, '[3, 2, 1]' <-> vectors_i16.vec AS scoreFROM vectors_i16ORDER BY score ASC;
+------+--------------------+
| id | score |
+------+--------------------+
| 1 | 2.8284271247461903 |
| 2 | 5.916079783099616 |
| 3 | 7.54983443527075 |
+------+--------------------+
Example 3 - VECTOR(4, F32) <-> BLOB
Create a table with a BLOB column type to store the vectors and use the JSON_
CREATE TABLE vectors_b (id int, vec BLOB not null);INSERT INTO vectors_b VALUES (1, JSON_ARRAY_PACK('[0.1, 0.8, 0.2, 0.555]'));INSERT INTO vectors_b VALUES (2, JSON_ARRAY_PACK('[0.45, 0.55, 0.495, 0.5]'));
The following SQL calculates the dot product between the @query_
and the vectors stored as BLOBs in the vectors_<->
in the SELECT
clause is a vector of length 4 with element type 32-bit floating-point number and will convert the BLOB
(the right-hand argument) to that vector type.
SET @query_vec = '[0.44, 0.554, 0.34, 0.62]':>VECTOR(4);SELECT id, @query_vec <*> vectors_b.vec AS scoreFROM vectors_bORDER BY score DESC;
+------+--------------------+
| id | score |
+------+--------------------+
| 2 | 0.9810000061988831 |
| 1 | 0.8993000388145447 |
+------+--------------------+
Important
When using vectors stored as BLOB
s it is important to ensure that the BLOB
s store vectors of the same length and element type as the query vector with which you are calculating the dot product.BLOB
matches the length expected for the VECTOR
based on the number of elements and element type of the VECTOR
; however, the system cannot check that the element types used in the BLOB
and the VECTOR
are the same.
Using EUCLIDEAN_ DISTANCE with Vectors as BLOBs
The following example shows the use of EUCLIDEAN_
to calculate the similarity between a query vector and a set of vectors in a table with the vectors stored as BLOBs.
Create a table with a column of type BLOB
to store the vectors.vec
and type BLOB
, will store the vectors.BLOB
s, hence the column of type BLOB
named vec
.
Then insert data using the JSON_
built-in function to easily insert properly formatted vectors.
CREATE TABLE vectors_b (id int, vec BLOB not null);INSERT INTO vectors_b VALUES (1, JSON_ARRAY_PACK('[0.1, 0.8, 0.2, 0.555]'));INSERT INTO vectors_b VALUES (2, JSON_ARRAY_PACK('[0.45, 0.55, 0.495, 0.5]'));
To demonstrate the contents of the table, use the JSON_
function to return the table elements in JSON format:
SELECT JSON_ARRAY_UNPACK(vec) FROM vectors_b;
+---------------------------------------------------+
| JSON_ARRAY_UNPACK(vec) |
+---------------------------------------------------+
| [0.449999988,0.550000012,0.495000005,0.5] |
| [0.100000001,0.800000012,0.200000003,0.555000007] |
+---------------------------------------------------+
You can also use the HEX()
built-in function to return a printable form of the binary data:
SELECT HEX(vec) FROM vectors_b;
+----------------------------------+
| HEX(vec) |
+----------------------------------+
| 6666E63ECDCC0C3FA470FD3E0000003F |
| CDCCCC3DCDCC4C3FCDCC4C3E7B140E3F |
+----------------------------------+
Query the table using the EUCLIDEAN_
function in a SELECT
statement.
The SQL below sets up query vector (@query_
) and then calculates the EUCLIDEAN_
of the query vector and the vectors in the vectors_
table.
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');SELECT EUCLIDEAN_DISTANCE(vec, @query_vec) AS scoreFROM vectors_b;
+---------------------+
| score |
+---------------------+
| 0.19631861943383927 |
| 0.44714763700937277 |
+---------------------+
EUCLIDEAN_ DISTANCE() with JSON_ ARRAY_ PACK()
The JSON_
function makes it easier to input properly-formatted vectors.JSON_
should be used when loading vectors into tables as is shown in the example below.JSON_
at the time they are loaded into a table so that the data stored in the BLOB
attribute in the table is in packed binary format.
JSON_
should not normally be used as an argument to the EUCLIDEAN_
function except when JSON_
is being used to build a constant vector value as is shown in the query below.
SELECT EUCLIDEAN_DISTANCE(JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]'), vec) AS scoreFROM vectors_bORDER BY score ASC;
+---------------------+
| score |
+---------------------+
| 0.19631861943383927 |
| 0.44714763700937277 |
+---------------------+
Note
SingleStore does not recommend using the EUCLIDEAN_
infix operator (<->
) with vectors stored as BLOB
s.BLOB
data, the infix operator (<->
) will interpret vector elements as 32-bit floating-point numbers.
Using Suffixes for Other Element Types with BLOBs
The default element type for vector storage and processing is 32-bit floating point (F32
).
You can specify the datatype of the vector elements to be used in the operation by adding a suffix to the function._
.
When using a suffix, the return type will be the type of the suffix.
Note
The functions with suffixes do not work with the VECTOR
type.
The following table lists the suffixes and their data type.
Suffix |
Data Type |
---|---|
|
8-bit signed integer |
|
16-bit signed integer |
|
32-bit signed integer |
|
64-bit signed integer |
|
32-bit floating-point number (IEEE standard format) |
|
64-bit floating-point number (IEEE standard format) |
EUCLIDEAN_ DISTANCE on BLOBs with 16-bit Integers
Below is an example of using JSON_
and EUCLIDEAN_
with 16-bit signed integers.
CREATE TABLE vectors_b_i (id int, vec BLOB not null);INSERT INTO vectors_b_i VALUES (1, JSON_ARRAY_PACK_I16('[1, 3, 2, 5]'));INSERT INTO vectors_b_i VALUES(2, JSON_ARRAY_PACK_I16('[23, 4, 1, 8]'));
SET @qv = JSON_ARRAY_PACK_I16('[1, 2, 3, 4]');SELECT EUCLIDEAN_DISTANCE_I16(@qv, vec) as EuclidDistFROM vectors_b_i;
+--------------------+
| EuclidDist |
+--------------------+
| 22.538855339169288 |
| 1.7320508075688772 |
+--------------------+
The result is an integer as indicated by the _
suffix.
When using suffixed versions of EUCLIDEAN_
, the return type will be the type of the suffix.
Note
Be sure that the suffixes you use to pack the vector data match the suffixes you use to unpack the data and the suffixes you use on functions to process that data.
Formatting Binary Vector Data for BLOBs
When using the BLOB
type for vector operations, vector data can be formatted using JSON_
.BLOB
s.BLOB
containing packed numbers in little-endian byte order.BLOB
s can be of any length; however, the input blob length must be divisible by the size of the packed vector elements (1, 2, 4 , or 8 bytes, depending on the vector element).
Related Topics
Last modified: August 14, 2024