EUCLIDEAN_DISTANCE
Returns the scalar Euclidean distance between two vector input values. It is a scalar function.
Syntax
EUCLIDEAN_DISTANCE(vector_expression, vector_expression)
Arguments
vector_expression
: An expression that evaluates to a vector. The vector must be encoded as a blob containing packed single-precision or double-precision floating-point numbers in little-endian byte order.
Remarks
A vector can be of any length, but the input blob length must be divisible by the packed vector element size (1, 2, 4 or 8 bytes, depending on the vector element).
If the result of EUCLIDEAN_DISTANCE()
is infinity, negative infinity, or not a number (NaN), NULL
will be returned instead.
EUCLIDEAN_DISTANCE(v1, v2)
is computationally equivalent to SQRT(DOT_PRODUCT(VECTOR_SUB(v1, v2), VECTOR_SUB(v1, v2)))
. However, the EUCLIDEAN_DISTANCE()
function is more efficient than the latter.
You can specify the datatype of the vector elements in which this operation is performed on the vector by adding a suffix to the function. Omitting the suffix from the function is equivalent to suffixing it with _F32
. All operations are done using the specified datatype. The following table lists the suffixes and their datatype.
Suffix | Datatype |
---|---|
| 8-bit signed integer |
| 16-bit signed integer |
| 32-bit signed integer |
| 64-bit signed integer |
| 32-bit floating-point number (IEEE standard format) |
| 64-bit floating-point number (IEEE standard format) |
Examples
Example: SELECT Using EUCLIDEAN_DISTANCE() on Existing Rows
The following example executes EUCLIDEAN_DISTANCE()
on two rows containing vectors. The HEX()
built-in function is also used to return a readable form of the binary output.
Create a table with two BLOB
-typed columns:
CREATE TABLE euc_t (a BLOB, b BLOB);
Using the JSON_ARRAY_PACK() built-in function to easily insert properly formatted vectors, insert a row with each vector in a different column:
INSERT INTO euc_t VALUES (JSON_ARRAY_PACK('[0.7, 0.2, 1.7]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'));
To demonstrate the contents of the table, use the HEX()
built-in function to return a readable form of the binary data:
SELECT HEX(a), HEX(b) FROM euc_t; **** +--------------------------+--------------------------+ | HEX(a) | HEX(b) | +--------------------------+--------------------------+ | 3333333FCDCC4C3E9A99D93F | 0000803F0000003F00000040 | +--------------------------+--------------------------+
You can also use the JSON_ARRAY_UNPACK()
function to return the table elements in JSON format:
SELECT JSON_ARRAY_UNPACK(a), JSON_ARRAY_UNPACK(b) FROM euc_t; **** +--------------------------------------+------------------------+ | JSON_ARRAY_UNPACK(a) | JSON_ARRAY_UNPACK(b) | +--------------------------------------+------------------------+ | [0.699999988,0.200000003,1.70000005] | [1,0.5,2] | +--------------------------------------+------------------------+
Query the table using the EUCLIDEAN_DISTANCE()
function in a SELECT
statement:
SELECT EUCLIDEAN_DISTANCE(a, b) FROM euc_t; **** +------------------------------+ | EUCLIDEAN_DISTANCE(a, b) | +------------------------------+ | 0.5196152239171921 | +------------------------------+
Example: EUCLIDEAN_DISTANCE() with JSON_ARRAY_PACK()
The following example uses JSON_ARRAY_PACK()
as input parameters to the EUCLIDEAN_DISTANCE()
built-in function:
SELECT EUCLIDEAN_DISTANCE(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[0.7, 0.2, 1.7]')); **** +--------------------------------------------------------------------------------------------+ | EUCLIDEAN_DISTANCE(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[0.7, 0.2, 1.7]')) | +--------------------------------------------------------------------------------------------+ | 0.5196152239171921 | +--------------------------------------------------------------------------------------------+
Example: Computing A Large Number of Euclidean Distances Efficiently, When there are Repeating Inputs
Suppose x
and y
are lists of vectors, where x
contains a large number of items. For each vector in x
, you want to calculate the euclidean distance for each vector in y
, where the vector in x
is the first euclidean distance input and the vector in y
is the second euclidean distance input. To compute this efficiently, you can separate x
and y
into two tables perform a cross join. For example,
SELECT x.vector, EUCLIDEAN_DISTANCE(x.vector, y.vector) FROM x,y;
When EUCLIDEAN_DISTANCE(x.vector, y.vector)
is calculated using a vector in x.vector
for each vector in y.vector
, the vector in x.vector
is loaded from memory to CPU only once.
Example: Typecasting the Datatype Using Suffixes
To perform this operation on vectors specified in a format other than the default 32-bit floating point number, specify a suffix with the EUCLIDEAN_DISTANCE()
function. For example,
SELECT EUCLIDEAN_DISTANCE_I16(JSON_ARRAY_PACK_I16('[12,10]'),JSON_ARRAY_PACK_I16('[8,6]')) "Result"; **** +-------------------+ | Result | +-------------------+ | 5.656854249492381 | +-------------------+