EUCLIDEAN_DISTANCE

Returns the scalar Euclidean distance between two vector input values. It is a scalar function.

Syntax

EUCLIDEAN_DISTANCE(vector_expression, vector_expression)

Arguments

  • vector_expression: An expression that evaluates to a vector. The vector must be encoded as a blob containing packed single-precision or double-precision floating-point numbers in little-endian byte order.

Remarks

To execute this function, the host processor must support AVX2 instruction set extensions. If AVX2 is not supported, an error will occur during execution.

If the result of EUCLIDEAN_DISTANCE() is infinity, negative infinity, or not a number (NaN), NULL will be returned instead.

EUCLIDEAN_DISTANCE(v1, v2) is computationally equivalent to SQRT(DOT_PRODUCT(VECTOR_SUB(v1, v2), VECTOR_SUB(v1, v2))). However, the EUCLIDEAN_DISTANCE() function is more efficient than the latter.

Examples

Example: SELECT Using EUCLIDEAN_DISTANCE() on Existing Rows

The following example executes EUCLIDEAN_DISTANCE() on two rows containing vectors. The HEX() built-in function is also used to return a readable form of the binary output.

Create a table with two BLOB-typed columns:

CREATE TABLE euc_t (a BLOB, b BLOB);

Using the JSON_ARRAY_PACK() built-in function to easily insert properly formatted vectors, insert a row with each vector in a different column:

INSERT INTO euc_t VALUES (JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));

To demonstrate the contents of the table, use the HEX() built-in function to return a readable form of the binary data:

SELECT HEX(a), HEX(b) FROM euc_t;
+--------------------------+--------------------------+
| HEX(a)                   | HEX(b)                   |
+--------------------------+--------------------------+
| 3333333FCDCC4C3E9A99D93F | 0000803F0000003F00000040 |
+--------------------------+--------------------------+

Query the table using the EUCLIDEAN_DISTANCE() function in a SELECT statement:

SELECT EUCLIDEAN_DISTANCE(a, b) FROM euc_t;
+------------------------------+
| EUCLIDEAN_DISTANCE(a, b)     |
+------------------------------+
|           0.5196152239171921 |
+------------------------------+

Example: EUCLIDEAN_DISTANCE() with JSON_ARRAY_PACK()

The following example uses JSON_ARRAY_PACK() as input parameters to the EUCLIDEAN_DISTANCE() built-in function:

SELECT EUCLIDEAN_DISTANCE(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'));
+--------------------------------------------------------------------------------------------+
| EUCLIDEAN_DISTANCE(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[0.7, 0.2, 1.7]')) |
+--------------------------------------------------------------------------------------------+
|                                                                         0.5196152239171921 |
+--------------------------------------------------------------------------------------------+

Example: Computing A Large Number of Euclidean Distances Efficiently, When there are Repeating Inputs

Suppose x and y are lists of vectors, where x contains a large number of items. For each vector in x, you want to calculate the euclidean distance for each vector in y, where the vector in x is the first euclidean distance input and the vector in y is the second euclidean distance input. To compute this efficiently, you can separate x and y into two tables perform a cross join. For example,

SELECT x.vector, EUCLIDEAN_DISTANCE(x.vector, y.vector)
FROM x,y;

When EUCLIDEAN_DISTANCE(x.vector, y.vector) is calculated using a vector in x.vector for each vector in y.vector, the vector in x.vector is loaded from memory to CPU only once.

Last modified: June 20, 2023

Was this article helpful?