SingleStore DB

EUCLIDEAN_DISTANCE

Returns the scalar Euclidean distance between two vector input values.

Syntax
EUCLIDEAN_DISTANCE(vector_expression, vector_expression)
Arguments
  • vector_expression: An expression that evaluates to a vector. The vector must be encoded as a blob containing packed single-precision or double-precision floating-point numbers in little-endian byte order.

Remarks

A vector can be of any length, but both input vectors must be the same length and the blob lengths must be divisible by 4 bytes (or 8 bytes, depending on the datatype specified).

If the result of EUCLIDEAN_DISTANCE() is infinity, negative infinity, or not a number (NaN), NULL will be returned instead.

EUCLIDEAN_DISTANCE(v1, v2) is computationally equivalent to SQRT(DOT_PRODUCT(VECTOR_SUB(v1, v2), VECTOR_SUB(v1, v2))). However, the EUCLIDEAN_DISTANCE() function is more efficient than the latter.

You can specify the datatype of the vector elements in which this operation is performed on the vector by adding a suffix to the function. Omitting the suffix from the function is equivalent to suffixing it with _F32. All operations are done using the specified datatype. The following table lists the suffixes and their datatype.

Suffix

Datatype

_I8

8-bit signed integer

_I16

16-bit signed integer

_I32

32-bit signed integer

_I64

64-bit signed integer

_F32

32-bit floating-point number (IEEE standard format)

_F64

64-bit floating-point number (IEEE standard format)

Examples

Example: SELECT Using EUCLIDEAN_DISTANCE() on Existing Rows

The following example executes EUCLIDEAN_DISTANCE() on two rows containing vectors. The HEX() built-in function is also used to return a readable form of the binary output.

Create a table with two BLOB-typed columns:

CREATE TABLE t (a BLOB, b BLOB);

Using the JSON_ARRAY_PACK() built-in function to easily insert properly formatted vectors, insert a row with each vector in a different column:

INSERT INTO t VALUES (JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));

To demonstrate the contents of the table, use the HEX() built-in function to return a readable form of the binary data:

SELECT HEX(t.a), HEX(t.b) FROM t;
****
+--------------------------+--------------------------+
| HEX(t.a)                 | HEX(t.b)                 |
+--------------------------+--------------------------+
| 3333333FCDCC4C3E9A99D93F | 0000803F0000003F00000040 |
+--------------------------+--------------------------+

You can also use the JSON_ARRAY_UNPACK() function to return the table elements in JSON format:

SELECT JSON_ARRAY_UNPACK(t.a), JSON_ARRAY_UNPACK(t.b) FROM t;
****
+--------------------------------------+------------------------+
| JSON_ARRAY_UNPACK(t.a)               | JSON_ARRAY_UNPACK(t.b) |
+--------------------------------------+------------------------+
| [0.699999988,0.200000003,1.70000005] | [1,0.5,2]              |
+--------------------------------------+------------------------+

Query the table using the EUCLIDEAN_DISTANCE() function in a SELECT statement:

SELECT EUCLIDEAN_DISTANCE(t.a, t.b) FROM t;
****
+------------------------------+
| EUCLIDEAN_DISTANCE(t.a, t.b) |
+------------------------------+
|           0.5196152239171921 |
+------------------------------+

Example: EUCLIDEAN_DISTANCE() with JSON_ARRAY_PACK()

The following example uses JSON_ARRAY_PACK() as input parameters to the EUCLIDEAN_DISTANCE() built-in function:

SELECT EUCLIDEAN_DISTANCE(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'));
****
+--------------------------------------------------------------------------------------------+
| EUCLIDEAN_DISTANCE(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[0.7, 0.2, 1.7]')) |
+--------------------------------------------------------------------------------------------+
|                                                                         0.5196152239171921 |
+--------------------------------------------------------------------------------------------+

Example: Computing A Large Number of Euclidean Distances Efficiently, When there are Repeating Inputs

Suppose x and y are lists of vectors, where x contains a large number of items. For each vector in x, you want to calculate the euclidean distance for each vector in y, where the vector in x is the first euclidean distance input and the vector in y is the second euclidean distance input. To compute this efficiently, you can separate x and y into two tables perform a cross join. For example,

SELECT x.vector, EUCLIDEAN_DISTANCE(x.vector, y.vector)
FROM x,y;

When EUCLIDEAN_DISTANCE(x.vector, y.vector) is calculated using a vector in x.vector for each vector in y.vector, the vector in x.vector is loaded from memory to CPU only once.

Example: Typecasting the Datatype Using Suffixes

To perform this operation on vectors specified in a format other than the default 32-bit floating point number, specify a suffix with the EUCLIDEAN_DISTANCE() function. For example,

SELECT EUCLIDEAN_DISTANCE_I16(JSON_ARRAY_PACK_I16('[12,10]'),JSON_ARRAY_PACK_I16('[8,6]')) "Result";
****
+-------------------+
| Result            |
+-------------------+
| 5.656854249492381 |
+-------------------+