EUCLIDEAN_ DISTANCE
On this page
The EUCLIDEAN_
function returns the euclidean distance between two vector values.EUCLIDEAN_
takes as input two vectors, and returns a numeric value.
A common use of EUCLIDEAN_
is to calculate the similarity between vectors (vector similarity), which is used in semantic text search, generative AI, searches of images and audio files, and other applications.EUCLIDEAN_
is to find a set of vectors that most closely match a query vector.
See Working with Vector Data for more information about using vectors in SingleStore.
Syntax
EUCLIDEAN_DISTANCE(vector_expression, vector_expression)
Arguments
-
vector_
: An expression that evaluates to a vector.expression Vectors can be stored in SingleStore using the BLOB
data type (BLOB Types).
Remarks
-
If the result of
EUCLIDEAN_
is infinity, negative infinity, or not a number (NaN), NULL will be returned instead.DISTANCE() -
The default format for vector element storage and processing is 32-bit floating point number (
FLOAT
).The EUCLIDEAN_
function assumes the vector inputs are encoded as 32-bit floating point numbers and returns aDISTANCE DOUBLE
. -
See Using Other Numeric Element Types for information on using
EUCLIDEAN_
with vectors with element types other than 32-bit floating point numbers.DISTANCE -
EUCLIDEAN_
is computationally equivalent toDISTANCE(v1, v2) SQRT(DOT_
.PRODUCT(VECTOR_ SUB(v1, v2), VECTOR_ SUB(v1, v2))) However, the EUCLIDEAN_
function is more efficient than the latter.DISTANCE()
Using EUCLIDEAN_ DISTANCE with Vectors as BLOBs
The following example shows the use of EUCLIDEAN_
to calculate the similarity between a query vector and a set of vectors in a table with the vectors stored as BLOBs.
Create a table with a column of type BLOB
to store the vectors.vec
and type BLOB
, will store the vectors.BLOB
s, hence the column of type BLOB
named vec
.
Then insert data using the JSON_
built-in function to easily insert properly formatted vectors.
CREATE TABLE vectors_b (id int, vec BLOB not null);INSERT INTO vectors_b VALUES (1, JSON_ARRAY_PACK('[0.1, 0.8, 0.2, 0.555]'));INSERT INTO vectors_b VALUES (2, JSON_ARRAY_PACK('[0.45, 0.55, 0.495, 0.5]'));
To demonstrate the contents of the table, use the JSON_
function to return the table elements in JSON format:
SELECT JSON_ARRAY_UNPACK(vec) FROM vectors_b;
+---------------------------------------------------+
| JSON_ARRAY_UNPACK(vec) |
+---------------------------------------------------+
| [0.449999988,0.550000012,0.495000005,0.5] |
| [0.100000001,0.800000012,0.200000003,0.555000007] |
+---------------------------------------------------+
You can also use the HEX()
built-in function to return a printable form of the binary data:
SELECT HEX(vec) FROM vectors_b;
+----------------------------------+
| HEX(vec) |
+----------------------------------+
| 6666E63ECDCC0C3FA470FD3E0000003F |
| CDCCCC3DCDCC4C3FCDCC4C3E7B140E3F |
+----------------------------------+
Query the table using the EUCLIDEAN_
function in a SELECT
statement.
The SQL below sets up query vector (@query_
) and then calculates the EUCLIDEAN_
of the query vector and vectors in the vectors_
table.
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');SELECT EUCLIDEAN_DISTANCE(vec, @query_vec) AS scoreFROM vectors_b;
+---------------------+
| score |
+---------------------+
| 0.19631861943383927 |
| 0.44714763700937277 |
+---------------------+
EUCLIDEAN_ DISTANCE() with JSON_ ARRAY_ PACK()
The JSON_
function makes it easier to input properly-formatted vectors.JSON_
should be used when loading vectors into tables as is shown in the example below.JSON_
at the time they are loaded into a table so that the data stored in the BLOB
attribute in the table is in packed binary format.
JSON_
should not normally be used as an argument to the DOT_
function except when JSON_
is being used to build a constant vector value as is shown in the query below.
SELECT EUCLIDEAN_DISTANCE(JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]'), vec) AS scoreFROM vectors_bORDER BY score ASC;
+---------------------+
| score |
+---------------------+
| 0.19631861943383927 |
| 0.44714763700937277 |
+---------------------+
Using EUCLIDEAN_ DISTANCE in Filters, Joins, and Ordering
EUCLIDEAN_
can appear wherever a floating point expression can be used in a query, including in filters, ordering, joins and cross products.
Use the vectors_
table and the @query_
created above, in addition to the new table created below, when running the following query.
The following query uses EUCLIDEAN_
in the SELECT
clause, names it "score," and then filters on the score in the WHERE
clause.EUCLIDEAN_
, of that vector with respect to @query_
.
SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');SELECT EUCLIDEAN_DISTANCE(vec, @query_vec) AS scoreFROM vectors_bWHERE score < 0.7ORDER BY score ASC;
+---------------------+
| score |
+---------------------+
| 0.19631861943383927 |
| 0.44714763700937277 |
+---------------------+
EUCLIDEAN_
can even be used in cross products and joins – both arguments to it can be table fields or derived from table fields.
CREATE TABLE vectors_2b (id_2 int, vec_2 BLOB not null);INSERT INTO vectors_2b VALUES (5, JSON_ARRAY_PACK('[0.4, 0.49, 0.16, 0.555]'));INSERT INTO vectors_2b VALUES (6, JSON_ARRAY_PACK('[-0.01, -0.1, -0.2, 0.975]'));
SELECT v1.id, v2.id_2, EUCLIDEAN_DISTANCE(v1.vec, v2.vec_2) AS scoreFROM vectors_b v1, vectors_2b v2WHERE EUCLIDEAN_DISTANCE (v1.vec, v2.vec_2)<0.7ORDER BY score ASC;
+------+------+---------------------+
| id | id_2 | score |
+------+------+---------------------+
| 2 | 5 | 0.34835327652980475 |
| 1 | 5 | 0.4332435843558954 |
+------+------+---------------------+
Using Other Numeric Element Types
The default format for vector element storage and processing is 32-bit floating point number (FLOAT
).EUCLIDEAN_
and JSON_
functions to specify other types for the vector elements._
.
Below is an example of using JSON_
and EUCLIDEAN_
with 16-bit signed integers.
CREATE TABLE vectors_b_i (id int, vec BLOB not null);INSERT INTO vectors_b_i VALUES (1, JSON_ARRAY_PACK_I16('[1, 3, 2, 5]'));INSERT INTO vectors_b_i VALUES(2, JSON_ARRAY_PACK_I16('[23, 4, 1, 8]'));
SET @qv = JSON_ARRAY_PACK_I16('[1, 2, 3, 4]');SELECT EUCLIDEAN_DISTANCE_I16(@qv, vec) as EuclidDistFROM vectors_b_i;
+--------------------+
| EuclidDist |
+--------------------+
| 5.0990195135927845 |
| 20.174241001832016 |
+--------------------+
The result is an integer as indicated by the _
suffix.
When using suffixed versions of EUCLIDEAN_
, the return type will be the type of the suffix.
Note
It is important that you use the EUCLIDEAN_
function that matches the encoding of the input vectors and that the encodings of the input vectors match.EUCLIDEAN_
should be used with vectors that have been encoded with JSON_
, and both input vectors must be encoded as I16
.
The table below lists the suffixes and associated data types.
Suffix |
Data Type |
---|---|
|
8-bit signed integer |
|
16-bit signed integer |
|
32-bit signed integer |
|
64-bit signed integer |
|
32-bit floating-point number (IEEE standard format) |
|
64-bit floating-point number (IEEE standard format) |
Formatting Binary Vector Data for BLOBs
When using the BLOB
data type for EUCLIDEAN_
operations, the vector data can be formatted using JSON_
.BLOB
s.BLOB
containing packed numbers in little-endian byte order.BLOB
s can be of any length; however, the input blob length must be divisible by the size of the packed vector elements (1, 2, 4 or 8 bytes, depending on the vector element).
Related Topics
Last modified: March 13, 2024