EUCLIDEAN_DISTANCE

The EUCLIDEAN_DISTANCE function returns the euclidean distance between two vector values. EUCLIDEAN_DISTANCE takes as input two vectors, and returns a numeric value.

A common use of EUCLIDEAN_DISTANCE is to calculate the similarity between vectors (vector similarity), which is used in semantic text search, generative AI, searches of images and audio files, and other applications. A typical query using EUCLIDEAN_DISTANCE is to find a set of vectors that most closely match a query vector.

SingleStore supports a native vector data type and indexed approximate-nearest-neighbor (ANN) search that provide high-performance vector search and easier building of vector-based applications.

See Vector Type, Vector Indexing, and Working with Vector Data for more information about using vectors in SingleStore.

Syntax

vector_expression <-> vector_expression
EUCLIDEAN_DISTANCE(vector_expression, vector_expression)

Arguments

  • vector_expression: An expression that evaluates to a vector. Vectors can be stored in SingleStore using the native VECTOR data type (Vector Type) or the BLOB data type (BLOB Types). SingleStore recommends using the VECTOR data type when possible.

Remarks

  • If the result of EUCLIDEAN_DISTANCE() is infinity, negative infinity, or not a number (NaN), NULL will be returned instead.

  • The default format for vector element storage and processing is 32-bit floating point number (FLOAT). The EUCLIDEAN_DISTANCE function assumes the vector inputs are encoded as 32-bit floating point numbers and returns a DOUBLE.

  • See Using Other Numeric Element Types for information on using EUCLIDEAN_DISTANCE with vectors with element types other than 32-bit floating point numbers.

  • EUCLIDEAN_DISTANCE(v1, v2) is computationally equivalent to SQRT(DOT_PRODUCT(VECTOR_SUB(v1, v2), VECTOR_SUB(v1, v2))). However, the EUCLIDEAN_DISTANCE() function is more efficient than the latter.

Using EUCLIDEAN_DISTANCE with the VECTOR Data Type

The following example shows the use of  EUCLIDEAN_DISTANCE() to calculate the similarity between a query vector and a set of vectors in a table.

Create a table with a column of type VECTOR, insert data into the table, and then verify the contents of the table.

CREATE TABLE vectors (id int, vec VECTOR(4) not null);
INSERT INTO vectors VALUES (1, '[0.45, 0.55, 0.495, 0.5]');
INSERT INTO vectors VALUES (2, '[0.1, 0.8, 0.2, 0.555]');
INSERT INTO vectors VALUES (3, '[-0.5, -0.03, -0.1, 0.86]');
INSERT INTO vectors VALUES (4, '[0.5, 0.3, 0.807, 0.1]');
SELECT id, vec
FROM vectors
ORDER BY id;
+------+---------------------------------------------------+
| id   | vec                                               |
+------+---------------------------------------------------+
|    1 | [0.449999988,0.550000012,0.495000005,0.5]         |
|    2 | [0.100000001,0.800000012,0.200000003,0.555000007] |
|    3 | [-0.5,-0.0299999993,-0.100000001,0.860000014]     |
|    4 | [0.5,0.300000012,0.806999981,0.100000001]         |
+------+---------------------------------------------------+

Notice that the results of the SELECT do not exactly match the values in the INSERT. This is because elements in the VECTOR data type are stored as floating point numbers and the values in the INSERT statement are not perfectly representable in floating point.

After you've created a table and inserted data, set up a query vector. The queries below will evaluate vector similarity between the vectors in the vectors table and @query_vec.

This query finds the similarities between vectors and @query_vec using using the EUCLIDEAN_DISTANCE infix operator <-> .The ORDER BY clause is included so the order of your results match the results shown below.

SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);
SELECT vec <-> @query_vec AS score
FROM vectors
ORDER BY score ASC;
+---------------------+
| score               |
+---------------------+
| 0.19631861943383927 |
| 0.44714763700937277 |
|  0.7460596366393402 |
|  1.2148481657187131 |
+---------------------+

Using EUCLIDEAN_DISTANCE (<->) in Filters, Joins, and Ordering

EUCLIDEAN_DISTANCE can appear wherever a floating point expression can be used in a query, including in filters, ordering, joins and cross products.

The DOT_PRODUCT page shows examples of using DOT_PRODUCT in various ways in SQL queries. EUCLIDEAN_DISTANCE can be used similarly. Remember that sort order is ASC when using EUCLIDEAN_DISTANCE and DESC with DOT_PRODUCT.

Below is one example of using EUCLIDEAN_DISTANCE (<->) in a join query. Both arguments can be table fields or derived from table fields.

Use the vectors table and the @query_vec created above, in addition to the new table created below, when running the following query.

CREATE TABLE vectors_2 (id_2 int, vec_2 VECTOR(4) not null);
INSERT INTO vectors_2 VALUES (5, '[0.4, 0.49, 0.16, 0.555]'); 
INSERT INTO vectors_2 VALUES (6, '[-0.01, -0.1, -0.2, 0.975]');
SELECT v1.id, v2.id_2, v1.vec <-> v2.vec_2 AS score
FROM vectors v1, vectors_2 v2
WHERE v1.vec <-> v2.vec_2 < 0.7
ORDER BY score ASC;
+------+------+---------------------+
| id   | id_2 | score               |
+------+------+---------------------+
|    1 |    5 | 0.34835327652980475 |
|    2 |    5 |  0.4332435843558954 |
|    3 |    6 |  0.5179044162849211 |
+------+------+---------------------+

Using EUCLIDEAN_DISTANCE with Vectors as BLOBs

The following example shows the use of  EUCLIDEAN_DISTANCE() to calculate the similarity between a query vector and a set of vectors in a table with the vectors stored as BLOBs.

Create a table with a column of type BLOB to store the vectors. The second column in this table, with column name vec and type BLOB, will store the vectors. This example demonstrates storing vector data using BLOBs, hence the column of type BLOB named vec.

Then insert data using the JSON_ARRAY_PACK() built-in function to easily insert properly formatted vectors.

CREATE TABLE vectors_b (id int, vec BLOB not null);
INSERT INTO vectors_b VALUES (1, JSON_ARRAY_PACK('[0.1, 0.8, 0.2, 0.555]')); 
INSERT INTO vectors_b VALUES (2, JSON_ARRAY_PACK('[0.45, 0.55, 0.495, 0.5]'));

To demonstrate the contents of the table, use the JSON_ARRAY_UNPACK() function to return the table elements in JSON format:

SELECT JSON_ARRAY_UNPACK(vec) FROM vectors_b;
+---------------------------------------------------+
| JSON_ARRAY_UNPACK(vec)                            |
+---------------------------------------------------+
| [0.449999988,0.550000012,0.495000005,0.5]         |
| [0.100000001,0.800000012,0.200000003,0.555000007] |
+---------------------------------------------------+

You can also use the HEX() built-in function to return a printable form of the binary data:

SELECT HEX(vec) FROM vectors_b;
+----------------------------------+
| HEX(vec)                         |
+----------------------------------+
| 6666E63ECDCC0C3FA470FD3E0000003F |
| CDCCCC3DCDCC4C3FCDCC4C3E7B140E3F |
+----------------------------------+

Query the table using the EUCLIDEAN_DISTANCE() function in a SELECT statement.

The SQL below sets up query vector (@query_vec) and then calculates the EUCLIDEAN_DISTANCE of the query vector and vectors in the vectors_b table.

SET @query_vec = JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]');
SELECT EUCLIDEAN_DISTANCE(vec, @query_vec) AS score 
FROM vectors_b;
+---------------------+
| score               |
+---------------------+
| 0.19631861943383927 |
| 0.44714763700937277 |
+---------------------+

EUCLIDEAN_DISTANCE() with JSON_ARRAY_PACK()

The JSON_ARRAY_PACK() function makes it easier to input properly-formatted vectors. JSON_ARRAY_PACK() should be used when loading vectors into tables as is shown in the example below. That is, vectors should be formatted with JSON_ARRAY_PACK() at the time they are loaded into a table so that the data stored in the BLOB attribute in the table is in packed binary format. SingleStore does not recommend storing vectors as JSON strings in tables, doing so will have a negative performance impact.

JSON_ARRAY_PACK() should not normally be used as an argument to the DOT_PRODUCT function except when JSON_ARRAY_PACK() is being used to build a constant vector value as is shown in the query below.

SELECT EUCLIDEAN_DISTANCE(JSON_ARRAY_PACK('[0.44, 0.554, 0.34, 0.62]'), vec) AS score
FROM vectors_b
ORDER BY score ASC;
+---------------------+
| score               |
+---------------------+
| 0.19631861943383927 |
| 0.44714763700937277 |
+---------------------+

Note

SingleStore does not recommend using the EUCLIDEAN_DISTANCE infix operator (<->) with vectors stored as BLOBs. When used with BLOB data, the infix operator (<->) will interpret vector elements as 32-bit floating point numbers.

Using Other Numeric Element Types

The default format for vector element storage and processing is 32-bit floating point number (FLOAT). If you wish to use a different data type, suffixes can be added to the EUCLIDEAN_DISTANCE and JSON_ARRAY_PACK functions to specify other types for the vector elements. When suffixes are appended, operations are done using the specified datatype. Omitting the suffix from the function is equivalent to suffixing it with _F32. The suffixes and associated data types are provided in a table below.

Below is an example of using JSON_ARRAY_PACK and EUCLIDEAN_DISTANCE with 16-bit signed integers.

CREATE TABLE vectors_b_i (id int, vec BLOB not null);
INSERT INTO vectors_b_i VALUES (1, JSON_ARRAY_PACK_I16('[1, 3, 2, 5]')); 
INSERT INTO vectors_b_i VALUES(2, JSON_ARRAY_PACK_I16('[23, 4, 1, 8]'));
SET @qv = JSON_ARRAY_PACK_I16('[1, 2, 3, 4]');
SELECT EUCLIDEAN_DISTANCE_I16(@qv, vec) as EuclidDist
FROM vectors_b_i;
+--------------------+
| EuclidDist         |
+--------------------+
| 5.0990195135927845 |
| 20.174241001832016 |
+--------------------+

The result is an integer as indicated by the _I16 suffix.

When using suffixed versions of EUCLIDEAN_DISTANCE, the return type will be the type of the suffix.

Note

It is important that you use the EUCLIDEAN_DISTANCE function that matches the encoding of the input vectors and that the encodings of the input vectors match. For example, EUCLIDEAN_DISTANCE_I16 should be used with vectors that have been encoded with JSON_ARRAY_PACK_I16, and both input vectors must be encoded as I16.

The table below lists the suffixes and associated data types.

Suffix

Data Type

_I8

8-bit signed integer

_I16

16-bit signed integer

_I32

32-bit signed integer

_I64

64-bit signed integer

_F32

32-bit floating-point number (IEEE standard format)

_F64

64-bit floating-point number (IEEE standard format)

Formatting Binary Vector Data for BLOBs

When using the BLOB data type for EUCLIDEAN_DISTANCE operations, the vector data can be formatted using JSON_ARRAY_PACK. Alternatively, if your vector data is already in a packed binary format, you can load that data into the BLOBs. The data must be encoded as a BLOB containing packed numbers in little-endian byte order. Vectors stored as BLOBs can be of any length; however, the input blob length must be divisible by the size of the packed vector elements (1, 2, 4 or 8 bytes, depending on the vector element). Vector element types listed in the table above are supported.

Last modified: March 13, 2024

Was this article helpful?